# Bayesian Nonparametric Regression Models for Insurance Claims Frequency and Severity

12-1-2022

Dissertation

## Degree Name

Doctor of Philosophy (PhD)

## Department

Mathematical Sciences

Kaushik Ghosh

Amei Amei

Malwane Ananda

Lung-Chang Chien

## Abstract

The prediction of future insurance claims frequency and severity is one of the most important problems in actuarial science. Such predictions help the actuary set insurance premiums based on observed risk factors, or covariates. Accuracy of these predictions is important from the point of view of both the insurance company as well as the insured customer. Typically, actuaries use parametric regression models to predict claims based on the covariate information. Such models assume the same functional form tying the response to the covariates for each data point. These models are not flexible enough and fail to accurately capture at the individual level, the relationship between the covariates and the claims frequency and severity, which are often multimodal, highly skewed, and heavy-tailed.In this dissertation, we explore the use of Bayesian nonparametric (BNP) regression models such as the Dirichlet process mixture model (DPMM) and Pitman-Yor process mixture model (PYMM) to model and predict insurance claims frequency and severity based on covariates. In particular, we model claims frequency as a mixture of Poisson regression and log(claims severity) as a mixture of normal regression, and use the Dirichlet process (DP) and Pitman-Yor process (PY) as a prior for the mixing distribution over the regression parameters. Unlike parametric regression, such models allow each data point to have its individual parameters, making them highly flexible, resulting in improved prediction accuracy. We calculate the posterior predictive distribution for claims frequency and severity using the Polya urn predictive rule of the Dirichlet process and the Pitman-Yor process. Markov chain Monte Carlo (MCMC) methods, such as Neal (2000)’s Algorithm 8 are used to sample from the posterior distributions in the DPMM and PYMM. One important by-product of these models is the clustering information, which can be used to ascertain the number of mixture components. We use simulation studies to demonstrate the accuracy of the proposed models. In addition, we use the French motor insurance claims data to demonstrate its accuracy and applicability in real data situations.

## Keywords

Bayesian nonparametric regression; Dirichlet process mixture model; Insurance claims modeling; Pitman-Yor process mixture model

## Disciplines

Statistics and Probability

pdf

10000 KB

English