"Predicting Variant Pathogenicity with Machine Learning" by Zachary Fitzhugh, Martin R. Schiller Ph.D. et al.

Undergraduate Research Symposium Posters

Title

Predicting Variant Pathogenicity with Machine Learning

Authors

Zachary Fitzhugh, University of Nevada, Las VegasFollow
Martin R. Schiller Ph.D., University of Nevada, Las VegasFollow
Fatma NasozFollow

Files

Download

Download Full Text (708 KB)

Description

There are roughly 22,000 protein-coding genes in the human body, many of which play important roles in biological functions. The proteins fold in 3D space, and this is most often necessary for function. A genetic variant can disrupt the secondary structure of a protein (one aspect of structure) or eliminate a site important in protein-protein interaction or post-translational modification. The loss of function or deregulation can result in disease. Thus, there is great biomedical interest in identifying disease-causing single-nucleotide variants.

We hypothesize that we can accurately predict variant pathogenicity. We used machine learning to predict the pathogenicity of a set of 28,369 single-nucleotide variants across 10 genes. The data are acquired from publicly available saturation mutagenesis data sets, which generate every possible amino acid substitution at every position in a protein. Our approach employs a support vector machine using linear, polynomial, and RBF kernel functions. The problem is implemented as a binary classification problem, where a label of 1 indicates a disease-causing variant and a label of 0 indicates a benign variant. The model predicts pathogenicity based on amino acid, post-translational modification, and secondary structure information. We cleaned and analyzed the data with custom Python scripts. Our results show average balanced accuracy scores for classifying pathogenicity of approximately 57.9%, 60.3%, and 60.3% for the linear, polynomial, and RBF kernels, respectively. Therefore, the model is an improvement over random guessing but has room for improvement.

Publication Date

Fall 11-15-2021

Language

English

Keywords

Machine learning; Saturation mutagenesis; Bioinformatics; Genetics; Support vector machines

File Format

pdf

File Size

986 KB

Comments

Faculty Mentor: Martin Schiller, Ph.D.

Recommended Citation

Fitzhugh, Zachary; Schiller, Martin R. Ph.D.; and Nasoz, Fatma, "Predicting Variant Pathogenicity with Machine Learning" (2021). Undergraduate Research Symposium Posters. 46.
https://digitalscholarship.unlv.edu/durep_posters/46

COinS

Digital Scholarship@UNLV

Undergraduate Research Symposium Posters

Title

Authors

Files

Description

Publication Date

Language

Keywords

File Format

File Size

Comments

Recommended Citation

Browse

Links

Digital Scholarship@UNLV

Undergraduate Research Symposium Posters

Title

Authors

Files

Description

Publication Date

Language

Keywords

File Format

File Size

Comments

Recommended Citation

Share

Browse

Links