Award Date

8-1-2023

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Mathematical Sciences

First Committee Member

Farhad Shokoohi

Second Committee Member

Hokwon Cho

Third Committee Member

Kaushik Ghosh

Fourth Committee Member

Sean McCoy

Number of Pages

92

Abstract

Predictive models are important tools used in all scientific fields. Machine learning (ML) algorithms and statistical models are widely used for decision-making because of their capability to tackle intricate and unique problems. In domains where data are high-dimensional and contain irrelevant and redundant features, ML algorithms are known to have superior performance over traditional (statistical) learning methods. However, researchers and analysts are often faced with a myriad of techniques to choose from, with no clear consensus on which will perform best for their specific task. Considering resource limitations, exhaustive exploration of all available methods is impractical and often fails to yield significant improvements, making it an unadvised approach.

In this study, we propose an efficient methodology for benchmarking feature selection and machine-learning algorithms with a practical evaluation in the context of credit scoring. A survey of credit-scoring literature was conducted to identify prevalent and high-performing methods, and a subset of methods was selected based on computational efficiency, interpretability, and predictive performance. The search led to the methods of chi-square, oblique principal component analysis, and genetic algorithm for feature selection, penalized logistic regression, support vector machines, extreme gradient boosted decision trees, and random forest for classification. We then designed a simulation study to evaluate the performance of the selected methods using relevant metrics. These results guided the selection of the most practical and effective methods, which were subsequently tested in a real-world credit-scoring environment. The simulation results indicate that penalized logistic regression and extreme gradient boosting with genetic algorithm feature selection emerged as the best-performing methods for prediction and dimension reduction. Furthermore, the study examined the impact of data characteristics on prediction performance. This research contributes to the method selection and optimization in credit scoring and highlights avenues for further investigation in related research areas.

Keywords

Benchmarking; Credit scoring; Machine learning

Disciplines

Statistics and Probability

File Format

pdf

File Size

2570 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/


Share

COinS