Award Date

8-1-2020

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Fatma Nasoz

Second Committee Member

Mira Han

Third Committee Member

Kazem Taghva

Fourth Committee Member

Mingon Kang

Fifth Committee Member

Qing Wu

Number of Pages

52

Abstract

Large amounts of data is being generated constantly each day, so much data that it is difficult to find patterns in order to predict outcomes and make decisions for both humans and machines alike. It would be useful if this data could be simplified using machine learning techniques. For example, biological cell identity is dependent on many factors tied to genetic processes. Such factors include proteins, gene transcription, and gene methylation. Each of these factors are highly complex mechanism with immense amounts of data. Simplifying these can then be helpful in finding patterns in them. Error-Correcting Output Codes (ECOC) does this for classification by breaking the problem into multiple binary cases. This thesis proposes a new approach that also splits the feature set into multiple subsets called views. This new proposed method is tested on multiple datasets from the University of California, Irvine (UCI) to analyze performance. The method is then applied to genetic data collected from The Cancer Genome Atls (TCGA) and the Gene Expression Omnibus (GEO) to try and improve results on classifying the tissue of origin for various tumor samples.

Keywords

ECOC; Ensemble learning; Error Correcting Codes; Genetics; Machine Learning; Multiomics

Disciplines

Artificial Intelligence and Robotics | Computer Engineering | Computer Sciences | Genetics

File Format

pdf

File Size

1170 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/


Share

COinS