Award Date

5-1-2016

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Yoohwan Kim

Second Committee Member

Kazem Taghva

Third Committee Member

Justin Zhan

Fourth Committee Member

Yahia Baghzouz

Number of Pages

66

Abstract

As computers become larger, more powerful, and more connected, many challenges arise in implementing and maintaining a secure computing environment. Some of the challenges come from the exponential increase of unstructured messages generated by the computer systems and applications. Although these data contain a wealth of information that is useful for advanced threat detection and prediction for future anomalies, the sheer volume, variety, and complexity of data make it difficult for even well-trained analysts to extract the right information. While conventional SIEM (Security Information and Event Management) tools provide some capability to collect, correlate, and detect certain events from structured messages, their rule-based correlation and detection algorithms fall short in utilizing information in unstructured messages. This study explores the possibility of utilizing techniques for text mining, natural language processing, and machine learning to detect security threat by extracting relevant information from various unstructured log messages collected from distributed non-homogeneous systems. The extracted features are used to run a number of experiments on the Packet Clearing House SKAION 2006 IARPA Dataset, and the performance of prediction is evaluated. In comparison to the base case without feature extraction, an average of 16.73% of accumulated performance gain and 84% of time reduction was achieved using extracted features only, while a 23.48% performance gain with 82.39% of time increase was attained using both unstructured free-text messages and extracted features. The results display strong potential for further increase in performance by using larger size of training sets and extracting more features from the unstructured log messages.

Keywords

Entity Extraction; Security Data Analytics; Security Information and Event Management (SIEM); SKAION 2006 IARPA Dataset; Text Classification; Unstructured Log

Disciplines

Computer Sciences

Language

English

Available for download on Tuesday, August 15, 2017


Share

COinS