Award Date
1-1-2005
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Mathematical Sciences
First Committee Member
Ashok K. Singh
Number of Pages
49
Abstract
Since a microarray gene expression database contains a large number of variables and a relatively small number of samples, using and analyzing the databases require an intense, large-dimension computation method. Principal component analysis (PCA) is a useful tool to reduce the number of dimensions, and therefore, the complexity. PCA allows us to analyze the gene expression database with a relatively small data dimension without losing relevant information and increases the analytic visibility of the data. The initial computation using PCA, however, involves calculating a high-dimension covariance or correlation matrix and requires time and hardware resources which are limited in most real situations; In this thesis, we propose to use a Block Principal Component Analysis (Block PCA) method, introduced by Liu et al. (2002), to produce a subset that can explain a large amount of variation and propose criterion to find the most appropriate subsets; The gene expression data typically is highly correlated and the covariance matrix becomes highly ill-conditioned. The Mahalanobis distances resulting from the application of software packages such as SAS are not reliable in such cases. We investigate the effect of ill-conditioning on Discriminant Analysis of gene expression data from a DNA microarray. Bioinformatics literature recommends forming blocks of variables that are correlated with another. We proposed the method of Partial Least Square (PLS) to form the block of correlated variables for use in Block PCA.
Keywords
Analysis; Block; Component; Data; Dimensions; Expression; Gene; Principals; Reducing
Controlled Subject
Statistics
File Format
File Size
1157.12 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Lee, Sang Hee, "On using block principal component analysis for reducing gene-expression data dimensions" (2005). UNLV Retrospective Theses & Dissertations. 1838.
http://dx.doi.org/10.25669/kmgt-jsb6
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS