Hi-LASSO: High-Dimensional LASSO
Document Type
Article
Publication Date
4-9-2019
Publication Title
IEEE Access
Volume
7
First page number:
1
Last page number:
12
Abstract
High-throughput genomic technologies are leading to a paradigm shift in research of computational biology. Computational analysis with high-dimensional data and its interpretation are essential for the understanding of complex biological systems. Most biological data (e.g., gene expression and DNA sequence data) are high-dimensional, but consist of much fewer samples than predictors. Such high-dimension, low sample size (HDLSS) data often cause computational challenges in biological data analysis. A number of least absolute shrinkage and selection operator (LASSO) methods have been widely used for identifying biomarkers or prognostic factors in the field of bioinformatics. The LASSO solution has been improved through the development of the LASSO derivatives, including elastic-net, adaptive LASSO, relaxed LASSO, VISA, random LASSO, and recursive LASSO. However, there are several known limitations of the existing LASSO solutions: multicollinearity (particularly with different signs), subset size limitation, and the lack of the statistical test of significance. We propose a high-dimensional LASSO (Hi-LASSO) that theoretically improves a LASSO model providing better performance of both prediction and feature selection on extremely high-dimensional data. The Hi-LASSO alleviates bias introduced from bootstrapping, refines importance scores, improves the performance taking advantage of global oracle property, provides a statistical strategy to determine the number of bootstrapping, and allows tests of significance for feature selection with appropriate distribution. The performance of Hi-LASSO was assessed by comparing the existing state-of-the-art LASSO methods in extensive simulation experiments with multiple data settings. The Hi-LASSO was also applied for survival analysis with GBM gene expression data.
Keywords
Hi-LASSO; LASSO; Random LASSO; High-Dimensional Data; Variable Selection
Disciplines
Computational Biology | Genetics and Genomics | Life Sciences
Language
English
Repository Citation
Kang, M.,
Ki, Y.,
Hao, J.,
Mallavarapu, T.,
Park, J.
(2019).
Hi-LASSO: High-Dimensional LASSO.
IEEE Access, 7
1-12.
http://dx.doi.org/10.1109/ACCESS.2019.2909071