Developing an Empirical Model to Forecast United States Presidential Elections: A Machine Learning Approach

Document Type

Article

Publication Date

10-17-2020

Publication Title

Advances in Social Sciences Research Journal

Volume

7

Issue

10

First page number:

186

Last page number:

198

Abstract

In this paper, we develop and compare two models for forecasting the 2020 U.S. presidential election using multiple linear regressions (MLR) and the Machine Learning method of Extreme Gradient Boosting (xgboost). We predict each state’s Republican vote share using seven continuous predictors from 1976-2016, as well as dummy columns for each state. After computing 95% confidence intervals for each prediction, we determine the candidates’ electoral college probabilities. The xgboost appears to be a very strong predictor, accounting for 98.6% of the variance with a 3.34% root mean square error (RMSE), whereas the MLR only accounts for 71.8% of the variance and leaves an RMSE of 6.35%. We observe that 1) both models predict a Democratic electoral college landslide in the 2020 elections, 2) Georgia, Iowa, Florida, North Carolina, and Ohio are crucial for the Republicans to win, and 3) Extreme Gradient Boosting is an attractive alternative to MLR in election forecasting.

Keywords

Presidential election; Electoral college; Forecast; XGBoost; Multiple linear regression

Disciplines

Other Political Science | Political Science | Social and Behavioral Sciences

Language

English

UNLV article access

Search your library

Share

COinS