Why it is hard to forecast a winner in horse race
Session Title
Session 1-3-C: Lightning Talks
Presentation Type
Lightning Talk
Location
Park MGM, Las Vegas, NV
Start Date
23-5-2023 1:45 PM
End Date
23-5-2023 3:15 PM
Disciplines
Applied Statistics
Abstract
Despite being studied for a long time, the solution for a problem of forecasting a winner in a horse race remains elusive. To the best of our knowledge, none of the published algorithms can be (profitably) used with industry standard information set.
That is the consequence of not considering the major features distinguishing horse race from other classification/forecasting problems:
(a) forecasting horse race winner is forecasting of rare event,
(b) regressors in racing data sets are multicollinear,
(c) The industry standard information set is an approximate description of a race, using a small number of snapshots. It describes the race on a coarse grid with large measurement errors.
Therefore, the algorithm that predict winners in horse race should be constructed by solving not a general binary classification problem but binary classification problem of rare events, with multicollinear regressors that contain large measurement errors.
Keywords
Mathematics of Gambling, forecasting rare events, multicollinearity, measurement errors
Funding Sources
None
Why it is hard to forecast a winner in horse race
Park MGM, Las Vegas, NV
Despite being studied for a long time, the solution for a problem of forecasting a winner in a horse race remains elusive. To the best of our knowledge, none of the published algorithms can be (profitably) used with industry standard information set.
That is the consequence of not considering the major features distinguishing horse race from other classification/forecasting problems:
(a) forecasting horse race winner is forecasting of rare event,
(b) regressors in racing data sets are multicollinear,
(c) The industry standard information set is an approximate description of a race, using a small number of snapshots. It describes the race on a coarse grid with large measurement errors.
Therefore, the algorithm that predict winners in horse race should be constructed by solving not a general binary classification problem but binary classification problem of rare events, with multicollinear regressors that contain large measurement errors.