Electrical and Computer Engineering Faculty Presentations

A Bootstrap Approach for Improving Logistic Regression Performance in Imbalanced Data Sets.

Michael Chang, University of Nevada, Las VegasFollow
Rohan J. Dalpatadu, University of Nevada, Las VegasFollow
Dieudonne Phanord, University of Nevada, Las VegasFollow
Ashok K. Singh, University of Nevada, Las VegasFollow

Document Type

Article

Publication Date

11-15-2018

Publication Title

Matter: International Journal of Science and Technology

Volume

Issue

First page number:

Last page number:

Abstract

In an imbalanced dataset with binary response, the percentages of successes and failures are not approximately equal. In many real world situations, majority of the observations are “normal” (i.e., success) with a much smaller fraction of failures. The overall probability of correct classification for extremely imbalanced data sets can be very high but the probability of correctly predicting the minority class can be very low. Consider a fictitious example of a dataset with 1,000,000 observations out of which 999,000 are successes and 1,000 failures. A rule that classifies all observations as successes will have very high accuracy of prediction (99.9%) but the probability of correctly predicting a failure will be 0. In many situations, the cost associated with incorrect prediction of a failure is high, and it is therefore important to improve the prediction accuracy of failures as well. Literature suggests that over-sampling of the minority class with replacement does not necessarily predict the minority class with higher accuracy. In this article, we propose a simple over-sampling method which bootstraps a subset of the minority class, and illustrate the bootstrap over-sampling method with several examples. In each of these examples, an improvement in prediction accuracy is seen.

Keywords

Binary Response; Prediction, SMOTE; Under-sampling; Over-sampling; Confusion Matrix; Accuracy; Precision; Recall; F1-measure

Disciplines

Applied Mathematics

Language

English

Repository Citation

Chang, M., Dalpatadu, R. J., Phanord, D., Singh, A. K. (2018, November). A Bootstrap Approach for Improving Logistic Regression Performance in Imbalanced Data Sets..

Available at: https://digitalscholarship.unlv.edu/ece_presentations/40

UNLV article access

Search your library

Find in your library

COinS

Digital Scholarship@UNLV

Electrical and Computer Engineering Faculty Presentations

A Bootstrap Approach for Improving Logistic Regression Performance in Imbalanced Data Sets.

Document Type

Publication Date

Publication Title

Volume

Issue

First page number:

Last page number:

Abstract

Keywords

Disciplines

Language

Repository Citation

Browse

Links

Digital Scholarship@UNLV

Electrical and Computer Engineering Faculty Presentations

A Bootstrap Approach for Improving Logistic Regression Performance in Imbalanced Data Sets.

Authors

Document Type

Publication Date

Publication Title

Volume

Issue

First page number:

Last page number:

Abstract

Keywords

Disciplines

Language

Repository Citation

Share

Browse

Links