Award Date

August 2019

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Justin Zhan

Second Committee Member

Kazem Taghva

Third Committee Member

Hal Berghel

Fourth Committee Member

Tiberio Garza

Number of Pages

54

Abstract

There are two main components of malware analysis. One is static malware analysis and the other is dynamic malware analysis. Static malware analysis involves examining the basic structure of the malware executable without executing it, while dynamic malware analysis relies on examining malware behavior after executing it in a controlled environment. Static malware analysis is typically done by modern anti-malware software by using signature-based analysis or heuristic-based analysis.

This thesis proposes the use of deep neural networks to learn features from a malware’s portable executable (PE) to minimize the occurrences of false positives when recognizing new malware. We use the EMBER dataset for training our model and compare our results with other known malware datasets. We show that using a simple deep neural network for learning vectorized PE features is not only effective, but is also less resource intensive as compared to conventional heuristic detection methods. Our model achieves an Area Under Curve (AUC) of 99.8% with 98% true positives at 1% false positives on the Receiver Output Characteristics (ROC) curve. We further propose the practical implementation of this model to show that it can potentially compliment or replace conventional anti-malware software.

Keywords

Data Science; Machine Learning; Microsoft Windows; Neural Networks; Portable Executable; Static Malware Analysis

Disciplines

Computer Sciences

Language

English


Share

COinS