Award Date


Degree Type


Degree Name

Master of Science (MS)


Computer Science

First Committee Member

Fatma Nasoz

Second Committee Member

Mira Han

Third Committee Member

Laxmi Gewali

Fourth Committee Member

Yoohwan Kim

Fifth Committee Member

Qing Wu

Number of Pages



Cancer has become one of the major factors responsible for global deaths, due to late diagnoses and lack of proper treatment. It involves the abnormal and uncontrolled growth of cells inside the body, which might spread from one place to different parts. Ribonucleic acid (RNA) sequencing can detect the changes occurring inside cells and helps to analyze the transcriptome of gene expression patterns inside RNA. Machine learning techniques can assist in the prediction of cancer at an early stage, if data is available. The objective of this thesis is to build models and classify different types of cancer. For this purpose, we implemented various machine learning models like support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN) and multilayer perceptron (MLP) to classify the samples according to their labels. The datasets for this research were collected from The Cancer genome Atlas (TCGA) and Genotype-Tissue Expression (GTEX). The machine learning models were trained on TCGA data and tested on independent dataset (GTEX). The data representation obtained using stacked denoising autoencoders were used to train and test the models. The models did not have very high performance; however, MLP performed better than others. The best features that were selected using SelectKBest, were also used to compare the performances. It was observed that the K-nearest neighbor classifier gave better results, with and accuracy of 85.12% while tested on independent data, and the training accuracy was 98.4%.


Computer Sciences

File Format


File Size

2.9 MB

Degree Grantor

University of Nevada, Las Vegas




IN COPYRIGHT. For more information about this rights statement, please visit