Award Date
8-1-2023
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Mathematical Sciences
First Committee Member
Farhad Shokoohi
Second Committee Member
Hokwon Cho
Third Committee Member
Kaushik Ghosh
Fourth Committee Member
Sean McCoy
Number of Pages
92
Abstract
Predictive models are important tools used in all scientific fields. Machine learning (ML) algorithms and statistical models are widely used for decision-making because of their capability to tackle intricate and unique problems. In domains where data are high-dimensional and contain irrelevant and redundant features, ML algorithms are known to have superior performance over traditional (statistical) learning methods. However, researchers and analysts are often faced with a myriad of techniques to choose from, with no clear consensus on which will perform best for their specific task. Considering resource limitations, exhaustive exploration of all available methods is impractical and often fails to yield significant improvements, making it an unadvised approach.
In this study, we propose an efficient methodology for benchmarking feature selection and machine-learning algorithms with a practical evaluation in the context of credit scoring. A survey of credit-scoring literature was conducted to identify prevalent and high-performing methods, and a subset of methods was selected based on computational efficiency, interpretability, and predictive performance. The search led to the methods of chi-square, oblique principal component analysis, and genetic algorithm for feature selection, penalized logistic regression, support vector machines, extreme gradient boosted decision trees, and random forest for classification. We then designed a simulation study to evaluate the performance of the selected methods using relevant metrics. These results guided the selection of the most practical and effective methods, which were subsequently tested in a real-world credit-scoring environment. The simulation results indicate that penalized logistic regression and extreme gradient boosting with genetic algorithm feature selection emerged as the best-performing methods for prediction and dimension reduction. Furthermore, the study examined the impact of data characteristics on prediction performance. This research contributes to the method selection and optimization in credit scoring and highlights avenues for further investigation in related research areas.
Keywords
Benchmarking; Credit scoring; Machine learning
Disciplines
Statistics and Probability
File Format
File Size
2570 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Verbeck, Gwen, "Benchmarking and Practical Evaluation of Machine and Statistical Learning Methods in Credit Scoring: A Method Selection Perspective" (2023). UNLV Theses, Dissertations, Professional Papers, and Capstones. 4855.
http://dx.doi.org/10.34917/36948206
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/