Injury Severity on Traffic Crashes: A Text Mining with an Interpretable Machine-Learning Approach
Document Type
Article
Publication Date
9-19-2020
Publication Title
Safety Science
Volume
132
First page number:
1
Last page number:
12
Abstract
The analysis of traffic crash severities provides significant information for the development of safety countermeasures. Most available traffic crash datasets contain rich information including linguistic narratives with details about crash events and contexts, which can reveal new insights regarding severity and associated causality factors. Previous research has paid insufficient attention to this source of information. This study proposes an approach to analyze traffic crash narratives to identify factors associated with high injury-severity levels. The proposed approach explicitly seeks global interpretability of the results by expanding the capabilities of the Local Interpretable Model-Agnostic Explanations (LIME) method. Our proposed new approach, Global Cross-Validation LIME (GCV-LIME), aggregates individual LIME explanations using cross-validation. Thus, this study combines machine learning-based text mining with GCV-LIME to identify likely causality factors for injury severities while providing interpretability as required by traffic safety analysts. Data for heavy vehicle crashes collected from 2007 to 2017 in Queensland, Australia, were used to evaluate the proposed approach. Six different machine-learning models were tested, and global explanations were generated using GCV-LIME. The results indicated a strong association among a set of terms, such as “collided_headon,” “side_collided,” “motorcycle,” “cab,” and “pedestrian” with fatal crashes. Results from GCV-LIME were compared with those obtained using the corresponding available tabular data and classic regression analysis. The comparison suggest that the proposed approach has great potential to provide additional insights as well as enables to confirm results obtained with classic analysis on tabular data. Results from GCV-LIME combined with knowledge and experience from safety analysts can help establish effective safety countermeasures based on factors likely causing crashes and/or increasing their severity.
Keywords
Crash severity; Text mining; Machine learning; Interpretable machine learning
Disciplines
Civil and Environmental Engineering | Engineering
Language
English
Repository Citation
Arteaga, C.,
Paz, A.,
Park, J. W.
(2020).
Injury Severity on Traffic Crashes: A Text Mining with an Interpretable Machine-Learning Approach.
Safety Science, 132
1-12.
http://dx.doi.org/10.1016/j.ssci.2020.104988