Award Date
5-1-2016
Degree Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science
First Committee Member
Yoohwan Kim
Second Committee Member
Kazem Taghva
Third Committee Member
Justin Zhan
Fourth Committee Member
Yahia Baghzouz
Number of Pages
66
Abstract
As computers become larger, more powerful, and more connected, many challenges arise in implementing and maintaining a secure computing environment. Some of the challenges come from the exponential increase of unstructured messages generated by the computer systems and applications. Although these data contain a wealth of information that is useful for advanced threat detection and prediction for future anomalies, the sheer volume, variety, and complexity of data make it difficult for even well-trained analysts to extract the right information. While conventional SIEM (Security Information and Event Management) tools provide some capability to collect, correlate, and detect certain events from structured messages, their rule-based correlation and detection algorithms fall short in utilizing information in unstructured messages. This study explores the possibility of utilizing techniques for text mining, natural language processing, and machine learning to detect security threat by extracting relevant information from various unstructured log messages collected from distributed non-homogeneous systems. The extracted features are used to run a number of experiments on the Packet Clearing House SKAION 2006 IARPA Dataset, and the performance of prediction is evaluated. In comparison to the base case without feature extraction, an average of 16.73% of accumulated performance gain and 84% of time reduction was achieved using extracted features only, while a 23.48% performance gain with 82.39% of time increase was attained using both unstructured free-text messages and extracted features. The results display strong potential for further increase in performance by using larger size of training sets and extracting more features from the unstructured log messages.
Keywords
Entity Extraction; Security Data Analytics; Security Information and Event Management (SIEM); SKAION 2006 IARPA Dataset; Text Classification; Unstructured Log
Disciplines
Computer Sciences
File Format
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Suh-Lee, Candace, "Mining Unstructured Log Messages for Security Threat Detection" (2016). UNLV Theses, Dissertations, Professional Papers, and Capstones. 2749.
http://dx.doi.org/10.34917/9112197
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/