Master of Science (MS)
First Committee Member
Ashok K. Singh
Number of Pages
Since a microarray gene expression database contains a large number of variables and a relatively small number of samples, using and analyzing the databases require an intense, large-dimension computation method. Principal component analysis (PCA) is a useful tool to reduce the number of dimensions, and therefore, the complexity. PCA allows us to analyze the gene expression database with a relatively small data dimension without losing relevant information and increases the analytic visibility of the data. The initial computation using PCA, however, involves calculating a high-dimension covariance or correlation matrix and requires time and hardware resources which are limited in most real situations; In this thesis, we propose to use a Block Principal Component Analysis (Block PCA) method, introduced by Liu et al. (2002), to produce a subset that can explain a large amount of variation and propose criterion to find the most appropriate subsets; The gene expression data typically is highly correlated and the covariance matrix becomes highly ill-conditioned. The Mahalanobis distances resulting from the application of software packages such as SAS are not reliable in such cases. We investigate the effect of ill-conditioning on Discriminant Analysis of gene expression data from a DNA microarray. Bioinformatics literature recommends forming blocks of variables that are correlated with another. We proposed the method of Partial Least Square (PLS) to form the block of correlated variables for use in Block PCA.
Analysis; Block; Component; Data; Dimensions; Expression; Gene; Principals; Reducing
University of Nevada, Las Vegas
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to email@example.com and include clear identification of the work, preferably with URL.
Lee, Sang Hee, "On using block principal component analysis for reducing gene-expression data dimensions" (2005). UNLV Retrospective Theses & Dissertations. 1838.