Award Date

8-1-2018

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Life Sciences

First Committee Member

Martin R. Schiller

Second Committee Member

Andrew J. Andres

Third Committee Member

Laurel A. Raftery

Fourth Committee Member

Mira Han

Fifth Committee Member

Ernesto Abel-Santos

Number of Pages

325

Abstract

The Carboxyl (C-) terminus of proteins is usually disordered and flexible allowing proteins to engage in an induced-fit type of interactions with other molecules. Often, these interactions are mediated through the C- terminal minimotifs, which are peptide sequences of 2-15 contiguous residues found in a protein with a known molecular function. These functions include protein binding, trafficking, and post-translational modifications. In this dissertation, we investigated how many human proteins contain the C-terminal minimotifs. We cataloged 3,593 previously verified C-terminal minimotifs representing 17% of the human proteome. Based on these observations, we asked: Does the remaining 83% of the human proteome also contain C-terminal minimotifs? and designated this area of research as the C-terminome. To investigate this question, we analyzed the carboxyl terminus of the human proteome both computationally and experimentally.

From a bioinformatics standpoint we used three approaches to predict the novel C-terminal minimotifs, i) 867 C-terminal minimotifs were inferred based in the human orthologs of the rodent proteins, ii) approximately 27,000 C-terminal minimotifs were predicted based on matches to verified consensus sequence patterns, and iii) approximately nine million C-terminal sequence patterns were predicted through mining of the last ten amino acids from the C-terminus of the human proteome.

We further calculated the fold enrichment score for each of the nine million sequences by computationally comparing the frequencies of its occurrence in the human C-terminome and the random C-terminomes. In the end, based on prediction accuracy, we inferred molecular functions for an additional 1%, predicted functions for ~1% of the human C-terminome and identified over-represented sequence patterns for 100% of the human C-terminome computationally.

To evaluate predicted molecular functions and validate our predictions for the C-terminome, we selected 30 new predicted C-terminal sequence patterns to test experimentally for one of the three molecular functions - binding. These putative C-terminal minimotifs were selected based on their fold enrichment score and represented 16% of the predicted human C-terminome. We developed an LC-MS/MS-based workflow to specifically identify the weak affinity transient interactions with peptides or proteins containing the predicted C-terminal patterns. Through an LS-MS/MS screen, we i) identified 32 previously known interactors for seven C-terminal minimotifs, ii) immunoprecipitated 2,048 potential binding partners for 30 C-terminal minimotifs, and iii) predicted 49 interactors based on matched GO biological functions for 11 putative C-terminal minimotifs.

In particular, three putative C-terminal minimotifs QxxL>, LxxxF> and LxxxI> co-immunoprecipitated ~ 100 proteins indicating that these C-terminal minimotifs may have a more generalizable binding function. 50% of the putative C-terminal minimotifs co-immunoprecipitated less than ten interactors suggesting that those putative C-terminal minimotifs had very specific functions. In addition, at least ten proteins involved in RNA splicing and the cell cycle were co-immunoprecipitated for LxxxI> and QxxL> minimotif patterns respectively suggesting a previously unidentified potential mechanism of interaction through C-terminal minimotifs. Therefore, experimentally, we assigned molecular functions to an additional 1% of the human C-terminome and inferred functions for an additional 16% of the human C-terminome.

Our work is the first report of the human C-terminome. We begin by consolidating previously known experimentally verified C-terminal minimotifs for 17% of the human proteome and then, inferred functions for 1% of the C-terminome through experiments done on the rodent proteome, predicted molecular functions based on verified consensus sequences for another 1%, and over-represented sequence patterns for 100% of the human C-terminome. We, further, assigned molecular functions for an additional 1% and inferred functions based on 16% of the human C-terminome based on LC-MS/MS experiments. We developed an LC-MS/MS-based workflow to study interactions of the C-terminal minimotifs with limited binding interfaces that can be scaled up to advance the knowledge of entire human C-terminome.

Keywords

Carboxyl Terminus; C-terminal Minimotifs; C-terminome; Mass-spectrometry; Proteomics; Short Linear Motifs

Disciplines

Bioinformatics | Cell Biology | Molecular Biology

Language

English

Available for download on Thursday, May 15, 2025


Share

COinS