Why it is hard to forecast a winner in horse race

Session Title

Session 1-3-C: Lightning Talks

Presentation Type

Lightning Talk

Location

Park MGM, Las Vegas, NV

Start Date

23-5-2023 1:45 PM

End Date

23-5-2023 3:15 PM

Disciplines

Applied Statistics

Abstract

Despite being studied for a long time, the solution for a problem of forecasting a winner in a horse race remains elusive. To the best of our knowledge, none of the published algorithms can be (profitably) used with industry standard information set.

That is the consequence of not considering the major features distinguishing horse race from other classification/forecasting problems:

(a) forecasting horse race winner is forecasting of rare event,

(b) regressors in racing data sets are multicollinear,

(c) The industry standard information set is an approximate description of a race, using a small number of snapshots. It describes the race on a coarse grid with large measurement errors.

Therefore, the algorithm that predict winners in horse race should be constructed by solving not a general binary classification problem but binary classification problem of rare events, with multicollinear regressors that contain large measurement errors.

Keywords

Mathematics of Gambling, forecasting rare events, multicollinearity, measurement errors

Author Bios

Vladimir Kazakov PhD University of Technology Sydney

Funding Sources

None

Share

COinS
 
May 23rd, 1:45 PM May 23rd, 3:15 PM

Why it is hard to forecast a winner in horse race

Park MGM, Las Vegas, NV

Despite being studied for a long time, the solution for a problem of forecasting a winner in a horse race remains elusive. To the best of our knowledge, none of the published algorithms can be (profitably) used with industry standard information set.

That is the consequence of not considering the major features distinguishing horse race from other classification/forecasting problems:

(a) forecasting horse race winner is forecasting of rare event,

(b) regressors in racing data sets are multicollinear,

(c) The industry standard information set is an approximate description of a race, using a small number of snapshots. It describes the race on a coarse grid with large measurement errors.

Therefore, the algorithm that predict winners in horse race should be constructed by solving not a general binary classification problem but binary classification problem of rare events, with multicollinear regressors that contain large measurement errors.