Master of Science in Computer Science
First Committee Member
Second Committee Member
Third Committee Member
Fourth Committee Member
Number of Pages
Cancer is one of the leading causes of death globally and was responsible for approximately 9.6 million deaths in 2018. One of the main reason for deaths from cancer is late-stage presentation and inaccessible diagnosis and treatment. Cancer often spreads from the part of the body where it started (primary site) to a different part of the body (metastatic site). Identifying the primary site of cancer plays a key role as it directs the appropriate treatment. Cancer which spreads needs the same treatment as its origin. Having this knowledge can help doctors to decide the type of treatment.
All cancers begin when one or more genes in a cell mutate and create abnormal proteins which cause cells to multiply uncontrollably. Genes are present in the DNA of each cell in human body, and research shows that distinct and abnormal patterns in methylation of DNA are observed in case of cancers. DNA methylation is also considered as an early and fundamental step where normal tissue undergoes transformations. Since DNA methylation is tissue-specific and change with cell differentiation, methylation sites are good markers for identifying tissues of origin.
In this thesis, we propose the use of machine learning techniques to identify the primary sites of cancers to increase the accuracy of diagnosis and treatment.
For this purpose, we implemented various classification algorithms in machine learning like support vector machines, random forests classifier, decision trees, and K nearest neighbor classifier to classify the tumor samples into their tissue origin and compared these models using traditional machine learning metrics. The models are trained and tested on features extracted from the DNA methylation datasets maintained by The Cancer Genome Atlas (TCGA). The experimental results showed that support vector machines could predict the primary sites with 95% training accuracy. The model gave 86% accuracy when tested on a completely independent dataset collected from Gene Expression Omnibus (GEO).
Cancer; DNA methylation; Machine learning; Primary tissue; Tissue
Gannavarapu Surya Naga, Sravani, "Machine Learning Classification of Primary Tissue Origin of Cancer from DNA Methylation Markers" (2019). UNLV Theses, Dissertations, Professional Papers, and Capstones. 3601.
Available for download on Friday, May 15, 2020