Injury Severity on Traffic Crashes: A Text Mining with an Interpretable Machine-Learning Approach

Document Type

Article

Publication Date

9-19-2020

Publication Title

Safety Science

Volume

132

First page number:

1

Last page number:

12

Abstract

The analysis of traffic crash severities provides significant information for the development of safety countermeasures. Most available traffic crash datasets contain rich information including linguistic narratives with details about crash events and contexts, which can reveal new insights regarding severity and associated causality factors. Previous research has paid insufficient attention to this source of information. This study proposes an approach to analyze traffic crash narratives to identify factors associated with high injury-severity levels. The proposed approach explicitly seeks global interpretability of the results by expanding the capabilities of the Local Interpretable Model-Agnostic Explanations (LIME) method. Our proposed new approach, Global Cross-Validation LIME (GCV-LIME), aggregates individual LIME explanations using cross-validation. Thus, this study combines machine learning-based text mining with GCV-LIME to identify likely causality factors for injury severities while providing interpretability as required by traffic safety analysts. Data for heavy vehicle crashes collected from 2007 to 2017 in Queensland, Australia, were used to evaluate the proposed approach. Six different machine-learning models were tested, and global explanations were generated using GCV-LIME. The results indicated a strong association among a set of terms, such as “collided_headon,” “side_collided,” “motorcycle,” “cab,” and “pedestrian” with fatal crashes. Results from GCV-LIME were compared with those obtained using the corresponding available tabular data and classic regression analysis. The comparison suggest that the proposed approach has great potential to provide additional insights as well as enables to confirm results obtained with classic analysis on tabular data. Results from GCV-LIME combined with knowledge and experience from safety analysts can help establish effective safety countermeasures based on factors likely causing crashes and/or increasing their severity.

Keywords

Crash severity; Text mining; Machine learning; Interpretable machine learning

Disciplines

Civil and Environmental Engineering | Engineering

Language

English

UNLV article access

Search your library

Share

COinS