Award Date

December 2015

Degree Type

Thesis

Degree Name

Master of Science in Engineering (MSE)

Department

Electrical Engineering

First Committee Member

Pushkin Kachroo

Second Committee Member

Emma Regentova

Third Committee Member

Ebrahim Saberinia

Fourth Committee Member

Haroon Stephen

Number of Pages

97

Abstract

Machine learning and data mining are currently hot topics of research and are applied in database, artificial intelligence, statistics, and so on to discover valuable knowledge and the patterns in big data available to users. Data mining is predominantly about processing unstructured data and extracting meaningful information from them for end users to help take business decisions. Machine learning techniques use mathematical algorithms to find a pattern or extract meaning out from big data. The popularity of such techniques in analyzing business problems has been enhanced by the arrival of big data.

The main objective of this thesis is to study the importance of big data and machine learning and their impact on transportation industry. This thesis is primarily a review of the important machine learning algorithms and their applications in the field of big data. The author has tried to showcase the need to extract meaningful information from the vast amount of big data in the form of traffic data available in today’s world and also listed different machine learning techniques that can be used to extract this knowledge required in order to facilitate better decision making for transportation applications.

The analysis is done by using five different multivariate analysis and machine learning techniques in data mining namely cluster analysis, multivariate linear regression, hierarchical multiple regression, factor analysis and discriminant analysis in two different software packages namely SPSS and R. As part of the analysis, the author has tried to explain how knowledge extracted from random traffic data containing variables such as age of the driver, sex of the driver, the day of the week, atmospheric condition and blood alcohol content of the driver can play an important role in predicting the traffic crash. The data taken into account is accident data, which was obtained from Fatality Analysis Reporting System (FARS) ranging from the year 1999 to 2009. It is concluded that traffic accidents were mostly impacted by the atmospheric conditions, blood alcohol content followed by the day of the week.

Keywords

Big Data; Machine Learning; Multivariate Analysis

Disciplines

Electrical and Computer Engineering

Language

English


Share

COinS