Showing posts with label reserving. Show all posts
Showing posts with label reserving. Show all posts
Hierarchical Loss Reserving with Stan
I continue with the growth curve model for loss reserving from last week's post. Today, following the ideas of James Guszcza [2] I will add an hierarchical component to the model, by treating the ultimate loss cost of an accident year as a random effect. Initially, I will use the
Last week's model assumed that cumulative claims payment could be described by a growth curve. I used the Weibull curve and will do so here again, but others should be considered as well, e.g. the log-logistic cumulative distribution function for long tail business, see [1].
The growth curve describes the proportion of claims paid up to a given development period compared to the ultimate claims cost at the end of time, hence often called development pattern. Cumulative distribution functions are often considered, as they increase monotonously from 0 to 100%. Multiplying the development pattern with the expected ultimate loss cost gives me then the expected cumulative paid to date value.
However, what I'd like to do is the opposite, I know the cumulative claims position to date and wish to estimate the ultimate claims cost instead. If the claims process is fairly stable over the years and say, once a claim has been notified the payment process is quite similar from year to year and claim to claim, then a growth curve model is not unreasonable. Yet, the number and the size of the yearly claims will be random, e.g. if a windstorm, fire, etc occurs or not. Hence, a random effect for the ultimate loss cost across accident years sounds very convincing to me.
Here is James' model as described in [2]:
\[
\begin{align}
CL_{AY, dev} & \sim \mathcal{N}(\mu_{AY, dev}, \sigma^2_{dev}) \\
\mu_{AY,dev} & = Ult_{AY} \cdot G(dev|\omega, \theta)\\
\sigma_{dev} & = \sigma \sqrt{\mu_{dev}}\\
Ult_{AY} & \sim \mathcal{N}(\mu_{ult}, \sigma^2_{ult})\\
G(dev|\omega, \theta) & = 1 - \exp\left(-\left(\frac{dev}{\theta}\right)^\omega\right)
\end{align}
\]The cumulative losses \(CL_{AY, dev}\) for a given accident year \(AY\) and development period \(dev\) follow a Normal distribution with parameters \(\mu_{AY, dev}\) and \(\sigma_{dev}\).
The mean itself is modelled as the product of an accident year specific ultimate loss cost \(Ult_{AY}\) and a development period specific parametric growth curve \(G(dev | \omega, \theta)\). The variance is believed to increase in proportion with the mean. Finally, the ultimate loss cost is modelled with a Normal distribution as well.
Assuming a Gaussian distribution of losses doesn't sound quite intuitive to me, as loss are often skewed to the right, but I shall continue with this assumption here to make a comparison with [2] possible.
Using the example data set given in the paper I can reproduce the result in R with
The fit looks pretty good, with only 5 parameters. See James' paper for a more detailed discussion.
Let's move this model into Stan. Here is my attempt, which builds on last week's pooled model. With the generated quantities code block I go beyond the scope of the original paper, as I try to estimate the full posterior predictive distribution as well.
The 'trick' is the line
The notation
Let's run the model:
The estimated parameters look very similar to the
Let's take a look at the parameter traceplot and the densities of the estimated ultimate loss costs by origin year.
This looks all not too bad. The trace plots don't show any particular patterns, apart from \(\sigma_{ult}\), which shows a little skewness.
The
The model seems to work rather well, even with the Gaussian distribution assumptions. Yet, it has still only 5 parameters. Note, this model doesn't need an additional artificial tail factor either.
James Guszcza contributed to a follow-up paper with Y. Zhank and V. Dukic [3] that extends the model described in [2]. It deals with skewness in loss data sets and the autoregressive nature of the errors in a cumulative time series.
Frank Schmid offers a more complex Bayesian analysis of claims reserving in [4], while Jake Morris highlights the similarities between a compartmental model used in drug research and loss reserving [5].
Finally, Glenn Meyers published a monograph on Stochastic Loss Reserving Using Bayesian MCMC Models earlier this year [7] that is worth taking a look at.
[2] James Guszcza. Hierarchical Growth Curve Models for Loss Reserving, 2008, CAS Fall Forum, pp. 146–173.
[3] Y. Zhang, V. Dukic, and James Guszcza. A Bayesian non-linear model for forecasting insurance loss payments. 2012. Journal of the Royal Statistical Society: Series A (Statistics in Society), 175: 637–656. doi: 10.1111/j.1467-985X.2011.01002.x
[4] Frank A. Schmid. Robust Loss Development Using MCMC. Available at SSRN. See also http://lossdev.r-forge.r-project.org/
[5] Jake Morris. Compartmental reserving in R. 2015. R in Insurance Conference.
[6] Stan Development Team. Stan: A C++ Library for Probability and Sampling, Version 2.8.0. 2015. http://mc-stan.org/.
[7] Glenn Meyers. Stochastic Loss Reserving Using Bayesian MCMC Models. Issue 1 of CAS Monograph Series. 2015.
nlme R package, just as James did in his paper, and then move on to Stan/RStan [6], which will allow me to estimate the full distribution of future claims payments.Last week's model assumed that cumulative claims payment could be described by a growth curve. I used the Weibull curve and will do so here again, but others should be considered as well, e.g. the log-logistic cumulative distribution function for long tail business, see [1].
The growth curve describes the proportion of claims paid up to a given development period compared to the ultimate claims cost at the end of time, hence often called development pattern. Cumulative distribution functions are often considered, as they increase monotonously from 0 to 100%. Multiplying the development pattern with the expected ultimate loss cost gives me then the expected cumulative paid to date value.
However, what I'd like to do is the opposite, I know the cumulative claims position to date and wish to estimate the ultimate claims cost instead. If the claims process is fairly stable over the years and say, once a claim has been notified the payment process is quite similar from year to year and claim to claim, then a growth curve model is not unreasonable. Yet, the number and the size of the yearly claims will be random, e.g. if a windstorm, fire, etc occurs or not. Hence, a random effect for the ultimate loss cost across accident years sounds very convincing to me.
Here is James' model as described in [2]:
\[
\begin{align}
CL_{AY, dev} & \sim \mathcal{N}(\mu_{AY, dev}, \sigma^2_{dev}) \\
\mu_{AY,dev} & = Ult_{AY} \cdot G(dev|\omega, \theta)\\
\sigma_{dev} & = \sigma \sqrt{\mu_{dev}}\\
Ult_{AY} & \sim \mathcal{N}(\mu_{ult}, \sigma^2_{ult})\\
G(dev|\omega, \theta) & = 1 - \exp\left(-\left(\frac{dev}{\theta}\right)^\omega\right)
\end{align}
\]The cumulative losses \(CL_{AY, dev}\) for a given accident year \(AY\) and development period \(dev\) follow a Normal distribution with parameters \(\mu_{AY, dev}\) and \(\sigma_{dev}\).
The mean itself is modelled as the product of an accident year specific ultimate loss cost \(Ult_{AY}\) and a development period specific parametric growth curve \(G(dev | \omega, \theta)\). The variance is believed to increase in proportion with the mean. Finally, the ultimate loss cost is modelled with a Normal distribution as well.
Assuming a Gaussian distribution of losses doesn't sound quite intuitive to me, as loss are often skewed to the right, but I shall continue with this assumption here to make a comparison with [2] possible.
Using the example data set given in the paper I can reproduce the result in R with
nlme:The fit looks pretty good, with only 5 parameters. See James' paper for a more detailed discussion.
Let's move this model into Stan. Here is my attempt, which builds on last week's pooled model. With the generated quantities code block I go beyond the scope of the original paper, as I try to estimate the full posterior predictive distribution as well.
The 'trick' is the line
mu[i] <- ult[origin[i]] * weibull_cdf(dev[i], omega, theta); where I have an accident year (here labelled origin) specific ultimate loss. The notation
ult[origin[i]] illustrates the hierarchical nature in Stan's language nicely.Let's run the model:
The estimated parameters look very similar to the
nlme output above.Let's take a look at the parameter traceplot and the densities of the estimated ultimate loss costs by origin year.
This looks all not too bad. The trace plots don't show any particular patterns, apart from \(\sigma_{ult}\), which shows a little skewness.
The
generated quantities code block in Stan allows me to get also the predictive distribution beyond the current data range. Here I forecast claims up to development year 12 and plot the predictions, including the 95% credibility interval of the posterior predictive distribution with the observations. The model seems to work rather well, even with the Gaussian distribution assumptions. Yet, it has still only 5 parameters. Note, this model doesn't need an additional artificial tail factor either.
Conclusions
The Bayesian approach sounds to me a lot more natural than many classical techniques around the chain-ladder methods. Thanks to Stan, I can get the full posterior distributions on both, the parameters and predictive distribution. I find communicating credibility intervals much easier than talking about the parameter, process and mean squared error.James Guszcza contributed to a follow-up paper with Y. Zhank and V. Dukic [3] that extends the model described in [2]. It deals with skewness in loss data sets and the autoregressive nature of the errors in a cumulative time series.
Frank Schmid offers a more complex Bayesian analysis of claims reserving in [4], while Jake Morris highlights the similarities between a compartmental model used in drug research and loss reserving [5].
Finally, Glenn Meyers published a monograph on Stochastic Loss Reserving Using Bayesian MCMC Models earlier this year [7] that is worth taking a look at.
References
[1] David R. Clark. LDF Curve-Fitting and Stochastic Reserving: A Maximum Likelihood Approach. Casualty Actuarial Society, 2003. CAS Fall Forum.[2] James Guszcza. Hierarchical Growth Curve Models for Loss Reserving, 2008, CAS Fall Forum, pp. 146–173.
[3] Y. Zhang, V. Dukic, and James Guszcza. A Bayesian non-linear model for forecasting insurance loss payments. 2012. Journal of the Royal Statistical Society: Series A (Statistics in Society), 175: 637–656. doi: 10.1111/j.1467-985X.2011.01002.x
[4] Frank A. Schmid. Robust Loss Development Using MCMC. Available at SSRN. See also http://lossdev.r-forge.r-project.org/
[5] Jake Morris. Compartmental reserving in R. 2015. R in Insurance Conference.
[6] Stan Development Team. Stan: A C++ Library for Probability and Sampling, Version 2.8.0. 2015. http://mc-stan.org/.
[7] Glenn Meyers. Stochastic Loss Reserving Using Bayesian MCMC Models. Issue 1 of CAS Monograph Series. 2015.
Session Info
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ChainLadder_0.2.3 rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1
[5] lattice_0.20-33
loaded via a namespace (and not attached):
[1] nloptr_1.0.4 plyr_1.8.3 tools_3.2.2
[4] digest_0.6.8 lme4_1.1-10 statmod_1.4.21
[7] gtable_0.1.2 nlme_3.1-122 mgcv_1.8-8
[10] Matrix_1.2-2 parallel_3.2.2 biglm_0.9-1
[13] SparseM_1.7 proto_0.3-10 coda_0.18-1
[16] gridExtra_2.0.0 stringr_1.0.0 MatrixModels_0.4-1
[19] lmtest_0.9-34 stats4_3.2.2 grid_3.2.2
[22] nnet_7.3-11 tweedie_2.2.1 inline_0.3.14
[25] cplm_0.7-4 minqa_1.2.4 actuar_1.1-10
[28] reshape2_1.4.1 car_2.1-0 magrittr_1.5
[31] scales_0.3.0 codetools_0.2-14 MASS_7.3-44
[34] splines_3.2.2 rsconnect_0.3.79 systemfit_1.1-18
[37] pbkrtest_0.4-2 colorspace_1.2-6 quantreg_5.19
[40] labeling_0.3 sandwich_2.3-4 stringi_1.0-1
[43] munsell_0.4.2 zoo_1.7-12
10 Nov 2015
06:36
Insurance
,
R
,
reserving
,
Stan
,
stochastic reserving
Loss Developments via Growth Curves and Stan
Last week I posted a biological example of fitting a non-linear growth curve with Stan/RStan. Today, I want to apply a similar approach to insurance data using ideas by David Clark [1] and James Guszcza [2].
Instead of predicting the growth of dugongs (sea cows), I would like to predict the growth of cumulative insurance loss payments over time, originated from different origin years. Loss payments of younger accident years are just like a new generation of dugongs, they will be small in size initially, grow as they get older, until the losses are fully settled.
Here is my example data set:
Following the articles cited above I will assume that the growth can be explained by a Weibull curve, with two parameters \(\theta\) (scale) and \(\omega\) (shape):
\[
G(dev|\omega, \theta) = 1 - \exp\left(-\left(\frac{dev}{\theta}\right)^\omega\right)
\]Inspired by the classical over-dispersed Poisson GLM in actuarial science, Guszcza [2] assumes a power variance structure for the process error as well. For the purpose of this post I will assume that claims cost follow a Normal distribution, to make a comparison with a the classical least square regression easier. With a prior estimate of the ultimate claims cost the cumulative loss can be modelled as:
\[
\begin{align}
CL_{AY, dev} & \sim \mathcal{N}(\mu_{dev}, \sigma^2_{dev}) \\
\mu_{dev} & = Ult \cdot G(dev|\omega, \theta)\\
\sigma_{dev} & = \sigma \sqrt{\mu_{dev}}
\end{align}
\]Perhaps, the above formula suggests a hierarchical modelling approach, with different ultimate loss costs by accident year. Indeed, this is the focus of [2] and I will endeavour to reproduce the results with Stan in my next post, but today I will stick to a pooled model that assumes a constant ultimate loss across all accident years, i.e. \(Ult_{AY} = Ult \;\forall\; AY\).
To prepare my analysis I read in the data as a long data frame instead of the matrix structure above. Additionally, I compose another data frame that I will use later to predict payments two years beyond the first 10 years. Furthermore, to align the output with [2] I relabelled the development periods from years to months, so that year 1 becomes month 6, year 2 becomes month 18, etc. The accident years run from 1991 to 2000, while the variable
To get a reference point for Stan I start with a non-linear least square regression:
The output above doesn't look unreasonable, apart from the accident years 1991, 1992 and 1998. The output of
The Stan code below is very similar to last week. Again, I am interested here in the posterior distributions, hence I add a block to generate quantities from those. Note, Stan comes with a build-in function for the cumulative Weibull distribution
I store the Stan code in a separate text file, which I read into R to compile the model. The compilation takes a little time. The sampling itself is done in a few seconds.
Let's take a look at the output:
The parameters are a little different to the output of
Finally, I can plot the 90% credible intervals of the posterior distributions.
The 90% prediction credible interval captures most of the data and although this model might not be suitable for reserving individual accident years, it could provide an initial starting point for further investigations. Additionally, thanks to the Bayesian model I have access to the full distribution, not just point estimators and standard errors.
My next post will continue with this data set and the ideas of James Guszcza by allowing the ultimate loss cost to vary by accident year, treating it as a random effect. Here is a teaser of what the output will look like:
[2] Guszcza, J. Hierarchical Growth Curve Models for Loss Reserving, 2008, CAS Fall Forum,
pp. 146–173.
Instead of predicting the growth of dugongs (sea cows), I would like to predict the growth of cumulative insurance loss payments over time, originated from different origin years. Loss payments of younger accident years are just like a new generation of dugongs, they will be small in size initially, grow as they get older, until the losses are fully settled.
Here is my example data set:
Following the articles cited above I will assume that the growth can be explained by a Weibull curve, with two parameters \(\theta\) (scale) and \(\omega\) (shape):
\[
G(dev|\omega, \theta) = 1 - \exp\left(-\left(\frac{dev}{\theta}\right)^\omega\right)
\]Inspired by the classical over-dispersed Poisson GLM in actuarial science, Guszcza [2] assumes a power variance structure for the process error as well. For the purpose of this post I will assume that claims cost follow a Normal distribution, to make a comparison with a the classical least square regression easier. With a prior estimate of the ultimate claims cost the cumulative loss can be modelled as:
\[
\begin{align}
CL_{AY, dev} & \sim \mathcal{N}(\mu_{dev}, \sigma^2_{dev}) \\
\mu_{dev} & = Ult \cdot G(dev|\omega, \theta)\\
\sigma_{dev} & = \sigma \sqrt{\mu_{dev}}
\end{align}
\]Perhaps, the above formula suggests a hierarchical modelling approach, with different ultimate loss costs by accident year. Indeed, this is the focus of [2] and I will endeavour to reproduce the results with Stan in my next post, but today I will stick to a pooled model that assumes a constant ultimate loss across all accident years, i.e. \(Ult_{AY} = Ult \;\forall\; AY\).
To prepare my analysis I read in the data as a long data frame instead of the matrix structure above. Additionally, I compose another data frame that I will use later to predict payments two years beyond the first 10 years. Furthermore, to align the output with [2] I relabelled the development periods from years to months, so that year 1 becomes month 6, year 2 becomes month 18, etc. The accident years run from 1991 to 2000, while the variable
origin maps those years from 1 to 10.To get a reference point for Stan I start with a non-linear least square regression:
The output above doesn't look unreasonable, apart from the accident years 1991, 1992 and 1998. The output of
gnls gives me an opportunity to provide my prior distributions with good starting points. I will use an inverse Gamma as a prior for \(\sigma\), constrained Normals for the parameters \(\theta, \omega\) and \(Ult\) as well. If you have better ideas, then please get in touch.The Stan code below is very similar to last week. Again, I am interested here in the posterior distributions, hence I add a block to generate quantities from those. Note, Stan comes with a build-in function for the cumulative Weibull distribution
weibull_cdf.I store the Stan code in a separate text file, which I read into R to compile the model. The compilation takes a little time. The sampling itself is done in a few seconds.
Let's take a look at the output:
The parameters are a little different to the output of
gnls, but well within the standard error. From the plots I notice that the samples for the ultimate loss as well as for \(\theta\) are a little skewed to the right. Well, assuming cumulative losses to follow a Normal distribution was a bit a of stretch to start with. Still, the samples seem to converge.Finally, I can plot the 90% credible intervals of the posterior distributions.
The 90% prediction credible interval captures most of the data and although this model might not be suitable for reserving individual accident years, it could provide an initial starting point for further investigations. Additionally, thanks to the Bayesian model I have access to the full distribution, not just point estimators and standard errors.
My next post will continue with this data set and the ideas of James Guszcza by allowing the ultimate loss cost to vary by accident year, treating it as a random effect. Here is a teaser of what the output will look like:
References
[1] David R. Clark. LDF Curve-Fitting and Stochastic Reserving: A Maximum Likelihood Approach. Casualty Actuarial Society, 2003. CAS Fall Forum.[2] Guszcza, J. Hierarchical Growth Curve Models for Loss Reserving, 2008, CAS Fall Forum,
pp. 146–173.
Session Info
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1
[4] nlme_3.1-122 lattice_0.20-33 ChainLadder_0.2.3
loaded via a namespace (and not attached):
[1] nloptr_1.0.4 plyr_1.8.3 tools_3.2.2
[4] digest_0.6.8 lme4_1.1-10 statmod_1.4.21
[7] gtable_0.1.2 mgcv_1.8-8 Matrix_1.2-2
[10] parallel_3.2.2 biglm_0.9-1 SparseM_1.7
[13] proto_0.3-10 gridExtra_2.0.0 coda_0.18-1
[16] stringr_1.0.0 MatrixModels_0.4-1 stats4_3.2.2
[19] lmtest_0.9-34 grid_3.2.2 nnet_7.3-11
[22] inline_0.3.14 tweedie_2.2.1 cplm_0.7-4
[25] minqa_1.2.4 reshape2_1.4.1 car_2.1-0
[28] actuar_1.1-10 magrittr_1.5 codetools_0.2-14
[31] scales_0.3.0 MASS_7.3-44 splines_3.2.2
[34] systemfit_1.1-18 rsconnect_0.3.79 pbkrtest_0.4-2
[37] colorspace_1.2-6 labeling_0.3 quantreg_5.19
[40] sandwich_2.3-4 stringi_1.0-1 munsell_0.4.2
[43] zoo_1.7-12
ChainLadder 0.2.2 is out with improved glmReserve function
We released version 0.2.2 of
Ok, what does this all mean? I will run through a couple of examples and look behind the scene of
Like most other functions in ChainLadder,
The data at hand is often presented in form of a claims triangle, such as the following example data set from the ChainLadder package:
Suppose all claims will be reported within 7 years, then I'd like to know how much money should be set aside for the origin years 2008 to 2013 for claims that have incurred but not been reported (IBNR) yet. Or, to put it differently, I have to predict the
First, let's reformat the data as it would be stored in a database, that is in a long format of incremental claims over the years (I add the years also as factors, which I will use later):
One of the oldest methods to predict future claims development is called chain-ladder, which can be regarded as a weighted linear regression through the origin of cumulative claims over the development periods. Multiplying those development factors to the latest available cumulative position allows me to predict future claims in an iterative fashion.
It is well know in actuarial science that a Poisson GLM produces the same forecasts as the chain-ladder model.
Let's check:
There is another aspect to highlight with the Poisson model; its variance is equal to the mean. Yet, in real data I often observe that the variance increases in proportion to the mean. Well, this can be remedied by using an over-dispersed quasi-Poisson model.
I think a more natural approach would be to assume a compound distribution that models the frequency and severity of claims separately, e.g. Poisson frequency and Gamma severity.
Here the Tweedie distributions comes into play.
Tweedie distributions are a subset of what are called Exponential Dispersion Models. EDMs are two parameter distributions from the linear exponential family that also have a dispersion parameter \(\Phi\).
Furthermore, the variance is a power function of the mean, i.e. \(\mbox{Var}(X)=\Phi \, E[X]^p\).
The canonical link function for a Tweedie distribution in a GLM is the power link \(\mu^q\) with \(q=1-p\). Note, \(q=0\) is interpreted as \(\log(\mu)\).
Thus, let \(\mu_i = E(y_i)\) be the expectation of the ith response. Then I have the following model.
\[
y \sim \mbox{Tweedie}(q, p)\\
E(y) = \mu^q = Xb \\
\mbox{Var}(y) = \Phi \mu^p
\]The variance power \(p\) characterises the distribution of the responses \(y\). The following are some special cases:
Finally, I get back to the
With
In my first example I set use the parameters \(p=1, q=0\), which should return the results of the Poisson model.
Setting the argument
Not surprisingly the estimated reserves are similar to the Poisson model, but with a smaller predicted standard error.
Intuitively the modelling approach makes a lot more sense, but I end up with one parameter for each origin and development period, hence there is a danger of over-parametrisation.
Looking at the plots above again I note that many origin periods have a very similar development. Perhaps a hierarchical model would be more appropriate?
For more details on
ChainLadder a few weeks ago. This version adds back the functionality to estimate the index parameter for the compound Poisson model in glmReserve using the cplm package by Wayne Zhang. Ok, what does this all mean? I will run through a couple of examples and look behind the scene of
glmReserve. However, the clue is in the title, glmReserve is a function that uses a generalised linear model to estimate future claims, assuming claims follow a Tweedie distribution. I should actually talk about a family of distributions that is known as Tweedie, named by Bent Jørgensen after Maurice Tweedie. Joe Rickert published a nice post about the Tweedie distribution last year.Like most other functions in ChainLadder,
glmReserve purpose is to predict future insurance claims based on historical data.The data at hand is often presented in form of a claims triangle, such as the following example data set from the ChainLadder package:
library(ChainLadder)
cum2incr(UKMotor)
## dev
## origin 1 2 3 4 5 6 7
## 2007 3511 3215 2266 1712 1059 587 340
## 2008 4001 3702 2278 1180 956 629 NA
## 2009 4355 3932 1946 1522 1238 NA NA
## 2010 4295 3455 2023 1320 NA NA NA
## 2011 4150 3747 2320 NA NA NA NA
## 2012 5102 4548 NA NA NA NA NA
## 2013 6283 NA NA NA NA NA NAThe rows present different origin periods in which accidents occurred and the columns along each row show the incremental reported claims over the years (the data itself is stored in a cumulative form). Suppose all claims will be reported within 7 years, then I'd like to know how much money should be set aside for the origin years 2008 to 2013 for claims that have incurred but not been reported (IBNR) yet. Or, to put it differently, I have to predict the
NA fields in the bottom right hand triangle.First, let's reformat the data as it would be stored in a database, that is in a long format of incremental claims over the years (I add the years also as factors, which I will use later):
claims <- as.data.frame(cum2incr(UKMotor)) # convert into long format
library(data.table)
claims <- data.table(claims)
claims <- claims[ , ':='(cal=origin+dev-1, # calendar period
originf=factor(origin),
devf=factor(dev))]
claims <- claims[order(dev), cum.value:=cumsum(value), by=origin]
setnames(claims, "value", "inc.value")
head(claims)
## origin dev inc.value cal originf devf cum.value
## 1: 2007 1 3511 2007 2007 1 3511
## 2: 2008 1 4001 2008 2008 1 4001
## 3: 2009 1 4355 2009 2009 1 4355
## 4: 2010 1 4295 2010 2010 1 4295
## 5: 2011 1 4150 2011 2011 1 4150
## 6: 2012 1 5102 2012 2012 1 5102
Let's visualise the data:library(lattice)
xyplot(cum.value/1000 + log(inc.value) ~ dev , groups=origin, data=claims,
t="b", par.settings = simpleTheme(pch = 16),
auto.key = list(space="right",
title="Origin\nperiod", cex.title=1,
points=FALSE, lines=TRUE, type="b"),
xlab="Development period", ylab="Amount",
main="Claims development by origin period",
scales="free")The left plot of the chart above shows the cumulative claims payment over time, while the right plot shows the log-transformed incremental claims development for each origin/accident year.One of the oldest methods to predict future claims development is called chain-ladder, which can be regarded as a weighted linear regression through the origin of cumulative claims over the development periods. Multiplying those development factors to the latest available cumulative position allows me to predict future claims in an iterative fashion.
It is well know in actuarial science that a Poisson GLM produces the same forecasts as the chain-ladder model.
Let's check:
# Poisson model
mdl.pois <- glm(inc.value ~ originf + devf, data=na.omit(claims),
family=poisson(link = "log"))
# predict claims
claims <- claims[, ':='(
pred.inc.value=predict(mdl.pois,
.SD[, list(originf, devf, inc.value)],
type="response")), by=list(originf, devf)]
# sum of future payments
claims[cal>max(origin)][, sum(pred.inc.value)]
## [1] 28655.77
# Chain-ladder forecast
summary(MackChainLadder(UKMotor, est.sigma = "Mack"))$Totals[4,]
## [1] 28655.77Ok, this worked. However, both of these models make actually fairly strong assumptions. The Poisson model by its very nature will only produce whole numbers, and although payments could be regarded as whole numbers, say in pence or cents, it does feel a little odd to me. Similarly, modelling the year on year developments via a weighted linear regression through the origin, as in the case of the chain-ladder model, sounds not intuitive either.There is another aspect to highlight with the Poisson model; its variance is equal to the mean. Yet, in real data I often observe that the variance increases in proportion to the mean. Well, this can be remedied by using an over-dispersed quasi-Poisson model.
I think a more natural approach would be to assume a compound distribution that models the frequency and severity of claims separately, e.g. Poisson frequency and Gamma severity.
Here the Tweedie distributions comes into play.
Tweedie distributions are a subset of what are called Exponential Dispersion Models. EDMs are two parameter distributions from the linear exponential family that also have a dispersion parameter \(\Phi\).
Furthermore, the variance is a power function of the mean, i.e. \(\mbox{Var}(X)=\Phi \, E[X]^p\).
The canonical link function for a Tweedie distribution in a GLM is the power link \(\mu^q\) with \(q=1-p\). Note, \(q=0\) is interpreted as \(\log(\mu)\).
Thus, let \(\mu_i = E(y_i)\) be the expectation of the ith response. Then I have the following model.
\[
y \sim \mbox{Tweedie}(q, p)\\
E(y) = \mu^q = Xb \\
\mbox{Var}(y) = \Phi \mu^p
\]The variance power \(p\) characterises the distribution of the responses \(y\). The following are some special cases:
- Normal distribution, p = 0
- Poisson distribution, p = 1
- Compound Poisson-Gamma distribution, 1 < p < 2
- Gamma distribution, p = 2
- Inverse-Gaussian, p = 3
- Stable, with support on the positive real numbers, p > 2
Finally, I get back to the
glmReserve function, which Wayne Zhang, the other author of the cplm package, contributed to the ChainLadder package.With
glmReserve I can model a claims triangle using the Tweedie distribution family.In my first example I set use the parameters \(p=1, q=0\), which should return the results of the Poisson model.
(m1 <- glmReserve(UKMotor, var.power = 1, link.power = 0))
## Latest Dev.To.Date Ultimate IBNR S.E CV
## 2008 12746 0.9732000 13097 351 125.8106 0.35843464
## 2009 12993 0.9260210 14031 1038 205.0826 0.19757473
## 2010 11093 0.8443446 13138 2045 278.8519 0.13635790
## 2011 10217 0.7360951 13880 3663 386.7919 0.10559429
## 2012 9650 0.5739948 16812 7162 605.2741 0.08451188
## 2013 6283 0.3038201 20680 14397 1158.1250 0.08044210
## total 62982 0.6872913 91638 28656 1708.1963 0.05961042Perfect, I get the same results, plus further information about the model.Setting the argument
var.power=NULL will estimate \(p\) in the interval \((1,2)\) using the cplm package.(m2 <- glmReserve(UKMotor, var.power=NULL))
## Latest Dev.To.Date Ultimate IBNR S.E CV
## 2008 12746 0.9732000 13097 351 110.0539 0.31354394
## 2009 12993 0.9260870 14030 1037 176.9361 0.17062307
## 2010 11093 0.8444089 13137 2044 238.5318 0.11669851
## 2011 10217 0.7360951 13880 3663 335.6824 0.09164138
## 2012 9650 0.5739948 16812 7162 543.6472 0.07590718
## 2013 6283 0.3038201 20680 14397 1098.7988 0.07632138
## total 62982 0.6873063 91636 28654 1622.4616 0.05662252
m2$model
##
## Call:
## cpglm(formula = value ~ factor(origin) + factor(dev), link = link.power,
## data = ldaFit, offset = offset)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.7901 -1.6969 0.0346 1.6087 8.4465
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.25763 0.04954 166.680 < 2e-16 ***
## factor(origin)2008 0.03098 0.05874 0.527 0.605588
## factor(origin)2009 0.09999 0.05886 1.699 0.110018
## factor(origin)2010 0.03413 0.06172 0.553 0.588369
## factor(origin)2011 0.08933 0.06365 1.403 0.180876
## factor(origin)2012 0.28091 0.06564 4.279 0.000659 ***
## factor(origin)2013 0.48797 0.07702 6.336 1.34e-05 ***
## factor(dev)2 -0.11740 0.04264 -2.753 0.014790 *
## factor(dev)3 -0.62829 0.05446 -11.538 7.38e-09 ***
## factor(dev)4 -1.03168 0.06957 -14.830 2.28e-10 ***
## factor(dev)5 -1.31346 0.08857 -14.829 2.28e-10 ***
## factor(dev)6 -1.86307 0.13826 -13.475 8.73e-10 ***
## factor(dev)7 -2.42868 0.25468 -9.536 9.30e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Estimated dispersion parameter: 10.788
## Estimated index parameter: 1.01
##
## Residual deviance: 299.27 on 15 degrees of freedom
## AIC: 389.18
##
## Number of Fisher Scoring iterations: 4
From the model I note that the dispersion parameter \(\phi\) was estimated as 10.788 and the index parameter \(p\) as 1.01.Not surprisingly the estimated reserves are similar to the Poisson model, but with a smaller predicted standard error.
Intuitively the modelling approach makes a lot more sense, but I end up with one parameter for each origin and development period, hence there is a danger of over-parametrisation.
Looking at the plots above again I note that many origin periods have a very similar development. Perhaps a hierarchical model would be more appropriate?
For more details on
glmReserve see the help file and package vignette.Session Info
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.6 ChainLadder_0.2.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 nloptr_1.0.4 plyr_1.8.3
[4] tools_3.2.2 digest_0.6.8 lme4_1.1-9
[7] statmod_1.4.21 gtable_0.1.2 nlme_3.1-121
[10] lattice_0.20-33 mgcv_1.8-7 Matrix_1.2-2
[13] parallel_3.2.2 biglm_0.9-1 SparseM_1.7
[16] proto_0.3-10 coda_0.17-1 stringr_1.0.0
[19] MatrixModels_0.4-1 stats4_3.2.2 lmtest_0.9-34
[22] grid_3.2.2 nnet_7.3-10 tweedie_2.2.1
[25] cplm_0.7-4 minqa_1.2.4 ggplot2_1.0.1
[28] reshape2_1.4.1 car_2.1-0 actuar_1.1-10
[31] magrittr_1.5 scales_0.3.0 MASS_7.3-44
[34] splines_3.2.2 systemfit_1.1-18 pbkrtest_0.4-2
[37] colorspace_1.2-6 quantreg_5.19 sandwich_2.3-4
[40] stringi_0.5-5 munsell_0.4.2 chron_2.3-47
[43] zoo_1.7-12
ChainLadder 0.2.1 released
Over the weekend we released version 0.2.1 of the ChainLadder package for claims reserving on CRAN.
Binary versions of the package will appear on the various CRAN mirrors over the next couple of days. Alternatively you can install ChainLadder directly from GitHub using the following R commands:
Completely new to ChainLadder? Start with the package vignette.
New Features
- New function
PaidIncurredChainby Fabio Concina, based on the 2010 Merz & Wüthrich paper Paid-incurred chain claims reserving method - Functions
plot.MackChainLadderandplot.BootChainLaddergained new argumentwhich, allowing users to specify which sub-plot to display. Thanks to Christophe Dutang for this suggestion.
![]() |
Output of plot(MackChainLadder(MW2014, est.sigma="Mack"), which=3:6) |
Changes
- Updated
NAMESPACEfile to comply with new R CMD checks in R-3.3.0 - Removed package dependencies on
grDevicesandHmisc - Expanded package vignette with new paragraph on importing spreadsheet data, a new section "Paid-Incurred Chain Model" and an added example for a full claims development picture in the "One Year Claims Development Result" section, see also [1] .
Binary versions of the package will appear on the various CRAN mirrors over the next couple of days. Alternatively you can install ChainLadder directly from GitHub using the following R commands:
install.packages(c(“systemfit”, “actuar", "statmod", "tweedie", "devtools"))
library(devtools)
install_github("mages/ChainLadder")
library(ChainLadder)Completely new to ChainLadder? Start with the package vignette.
References
[1] Claims run-off uncertainty: the full picture. (with M. Merz) SSRN Manuscript, ID 2524352, 2014.
14 Jul 2015
07:05
ChainLadder
,
Insurance
,
R
,
reserving
ChainLadder 0.2.0 adds Solvency II CDR functions
ChainLadder is an R package that provides statistical methods and models for claims reserving in general insurance.
With version 0.2.0 we added new functions to estimate the claims development result (CDR) as required under Solvency II. Special thanks to Alessandro Carrato, Giuseppe Crupi and Mario Wüthrich who have contributed code and documentation.
For further details see package vignette and the help pages of the respective functions.
Michael Merz and Mario V. Wüthrich. Claims run-off uncertainty: the full picture. SSRN Manuscript, 2524352, 2014.
With version 0.2.0 we added new functions to estimate the claims development result (CDR) as required under Solvency II. Special thanks to Alessandro Carrato, Giuseppe Crupi and Mario Wüthrich who have contributed code and documentation.
New Features
- New generic function
CDRto estimate the one year claims development result. S3 methods for the Mack and bootstrap model have been added already:
- New function
tweedieReserveto estimate reserves in a GLM framework, including the one year claims development result. - Package vignette has a new chapter on One Year Claims Development Result
- New example data
MW2008andMW2014form the Merz & Wüthrich (2008, 2014) papers
Changes
- Source code development moved from Google Code to GitHub
as.data.frame.trianglenow gives warning message when dev. period is a character.- Alessandro Carrato, Giuseppe Crupi and Mario Wüthrich have been added as authors, thanks to their major contribution to code and documentation.
- Christophe Dutang, Arnaud Lacoume and Arthur Charpentier have been added as contributors, thanks to their feedback, guidance and code contribution.
Examples
The examples below use the triangle of the 2008 Merz & Wüthrich paper and illustrate how the one year claims development result can be estimated using the newCDR function for output of MackChainLadder and BootChainLadder. Also the tweedieReserve function is demonstrated, which can estimate the one year CDR as well, by setting the argument rereserving to TRUE.For further details see package vignette and the help pages of the respective functions.
References
Michael Merz and Mario V. Wüthrich. Modelling the claims development result for solvency purposes. CAS E-Forum, Fall:542–568, 2008Michael Merz and Mario V. Wüthrich. Claims run-off uncertainty: the full picture. SSRN Manuscript, 2524352, 2014.
Session Info
R version 3.1.3 (2015-03-09)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ChainLadder_0.2.0 statmod_1.4.20 systemfit_1.1-14 lmtest_0.9-33
[5] zoo_1.7-12 car_2.0-25 Matrix_1.1-5
loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 actuar_1.1-8 cluster_2.0.1
[4] colorspace_1.2-6 digest_0.6.8 foreign_0.8-63
[7] Formula_1.2-0 ggplot2_1.0.0 grid_3.1.3
[10] gtable_0.1.2 Hmisc_3.15-0 lattice_0.20-30
[13] latticeExtra_0.6-26 lme4_1.1-7 MASS_7.3-39
[16] mgcv_1.8-5 minqa_1.2.4 munsell_0.4.2
[19] nlme_3.1-120 nloptr_1.0.4 nnet_7.3-9
[22] parallel_3.1.3 pbkrtest_0.4-2 plyr_1.8.1
[25] proto_0.3-10 quantreg_5.11 RColorBrewer_1.1-2
[28] Rcpp_0.11.5 reshape2_1.4.1 rpart_4.1-9
[31] sandwich_2.3-2 scales_0.2.4 SparseM_1.6
[34] splines_3.1.3 stringr_0.6.2 survival_2.38-1
[37] tools_3.1.3 tweedie_2.2.1
24 Mar 2015
07:18
ChainLadder
,
Insurance
,
R
,
reserving
ChainLadder 0.1.8 released
Over the weekend we released version 0.1.8 of the ChainLadder package for claims reserving on CRAN.
As a result insurers don't know the upfront cost for their service, but rely on historical data analysis and judgement to predict a sustainable price for their offering. In General Insurance (or Non-Life Insurance, e.g. motor, property and casualty insurance) most policies run for a period of 12 months. However, the claims payment process can take years or even decades. Therefore often not even the delivery date of their product is known to insurers. The money set aside for those future claims payments are called reserves.
What is claims reserving?
The insurance industry, unlike other industries, does not sell products as such but promises. An insurance policy is a promise by the insurer to the policyholder to pay for future claims for an upfront received premium.As a result insurers don't know the upfront cost for their service, but rely on historical data analysis and judgement to predict a sustainable price for their offering. In General Insurance (or Non-Life Insurance, e.g. motor, property and casualty insurance) most policies run for a period of 12 months. However, the claims payment process can take years or even decades. Therefore often not even the delivery date of their product is known to insurers. The money set aside for those future claims payments are called reserves.
26 Aug 2014
07:24
ChainLadder
,
Insurance
,
News
,
R
,
reserving
,
stochastic reserving
Creating a matrix from a long data.frame
There can never be too many examples for transforming data with R. So, here is another example of reshaping a
Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years.
The format of the data frame above is how this kind of data is usually stored in a data base. However, I would like to see the payments of the different origin years in rows of a matrix.
The first idea might be to use the
An elegant alternative to
data.frame into a matrix.Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years.
The format of the data frame above is how this kind of data is usually stored in a data base. However, I would like to see the payments of the different origin years in rows of a matrix.
The first idea might be to use the
reshape function, but that would return a data.frame. Yet, it is actually much easier with the matrix function itself. Most of the code below is about formatting the dimension names of the matrix. Note that I use the with function to save me a bit of typing.An elegant alternative to
matrix provides the acast function of the reshape2 package. It has a nice formula argument and allows me not only to specify the aggregation function, but also to add the margin totals.ChainLadder 0.1.6 released with chain-ladder factor models
Version 0.1.6 of the ChainLadder package has been released and is already available from CRAN.
The new version adds the function
The added functionality was implemented by Dan Murphy, who is the co-author of the paper A Family of Chain-Ladder Factor Models for Selected Link Ratios by Bardis, Majidi, Murphy. You find a more detailed explanation with R code examples on Dan's blog and see also his slides from the CAS spring meeting.
The new version adds the function
CLFMdelta. CLFMdelta finds consistent weighting parameters delta for a vector of selected age-to-age chain-ladder factors for a given run-off triangle.The added functionality was implemented by Dan Murphy, who is the co-author of the paper A Family of Chain-Ladder Factor Models for Selected Link Ratios by Bardis, Majidi, Murphy. You find a more detailed explanation with R code examples on Dan's blog and see also his slides from the CAS spring meeting.
![]() |
| Slides by Dan Murphy |
20 Aug 2013
07:49
ChainLadder
,
Insurance
,
News
,
R
,
reserving
ChainLadder 0.1.5-6 released on CRAN
Last week we released version 0.1.5-6 of the ChainLadder package on CRAN. The ChainLadder package provides statistical models, which are typically used for the estimation of outstanding claims reserves in general insurance. The package vignette gives an overview of the package functionality.
Since the last CRAN release Dan Murphy added new features to the
Please get in touch if you would like to collaborate or find any issues or bugs.
Join me at the first R in Insurance conference at Cass Business School in London, 15 July 2013.
![]() |
Output of plot(MackChainLadder(GenIns)) |
Since the last CRAN release Dan Murphy added new features to the
MackChainLadder function and we fixed a bug in BootChainLadder. Here are he details:New Features
- The list output of the MackChainLadder function now includes the parameter risk and process risk breakdowns of the total risk estimate for the sum of projected losses across all origin years by development age.
- The Mack Method's recursive parameter risk calculation now enables Mack's original two-term formula (the default) and optionally the three-term formula found in Murphy's 1994 paper and in the 2006 paper by Buchwalder, Bühlmann, Merz, and Wüthrich.
- A few more Mack Method examples.
Bug Fixes
- The phi-scaling factor in BootChainLadder was incorrect. Instead of calculating the number of data items in the upper left triangle as n*(n+1)/2, n*(n-1)/2 was used. Thanks to Thomas Girodot for reporting this bug.
Please get in touch if you would like to collaborate or find any issues or bugs.
Join me at the first R in Insurance conference at Cass Business School in London, 15 July 2013.
Reserving based on log-incremental payments in R, part III
This is the third post about Christofides' paper on Regression models based on log-incremental payments [1]. The first post covered the fundamentals of Christofides' reserving model in sections A - F, the second focused on a more realistic example and model reduction of sections G - K. Today's post will wrap up the paper with sections L - M and discuss data normalisation and claims inflation.
I will use the same triangle of incremental claims data as introduced in my previous post. The final model had three parameters for origin periods and two parameters for development periods. It is possible to reduce the model further as Christofides illustrates in section L onwards by using an inflation index to bring all claims payments to current value and a claims volume adjustment or weight for each origin period to normalise the triangle.
In his example Christofides uses claims volume adjustments for the origin years and an earning or inflation index for the different payment calendar years. The claims volume adjustments aims to normalise the triangle for similar exposures across origin periods, while the earnings index, which measures largely wages and other forms of compensations, is used as a first proxy for claims inflation. Note that the earnings index shows significant year on year changes from 5% to 9%. Barnett and Zehnwirth [2] would probably recommend to add further parameters for the calendar year effects to the model.
I will use the same triangle of incremental claims data as introduced in my previous post. The final model had three parameters for origin periods and two parameters for development periods. It is possible to reduce the model further as Christofides illustrates in section L onwards by using an inflation index to bring all claims payments to current value and a claims volume adjustment or weight for each origin period to normalise the triangle.
In his example Christofides uses claims volume adjustments for the origin years and an earning or inflation index for the different payment calendar years. The claims volume adjustments aims to normalise the triangle for similar exposures across origin periods, while the earnings index, which measures largely wages and other forms of compensations, is used as a first proxy for claims inflation. Note that the earnings index shows significant year on year changes from 5% to 9%. Barnett and Zehnwirth [2] would probably recommend to add further parameters for the calendar year effects to the model.
# Page D5.36
ClaimsVolume <- data.frame(origin=0:6,
volume.index=c(1.43, 1.45, 1.52, 1.35, 1.29, 1.47, 1.91))
# Page D5.36
EarningIndex <- data.frame(cal=0:6,
earning.index=c(1.55, 1.41, 1.3, 1.23, 1.13, 1.05, 1))
# Year on year changes
round((1-EarningIndex$earning.index[-1]/EarningIndex$earning.index[-7]),2)
# [1] 0.09 0.08 0.05 0.08 0.07 0.05
dat <- merge(merge(dat, ClaimsVolume), EarningIndex)
# Normalise data for volume and earnings
dat$logvalue.ind.inf <- with(dat, log(value/volume.index*earning.index))
with(dat, interaction.plot(dev, origin, logvalue.ind.inf))
points(1+dat$dev, dat$logvalue.ind.inf, pch=16, cex=0.8)
Indeed, the interaction plot shows the various origin years now to be much more closely grouped. Only the single point of the last origin period stands out now.
Christofides tests several models with different numbers of origin levels, but I am happy with the minimal model using only one parameter for the origin period, namely the intercept:
22 Jan 2013
23:18
Actuarial
,
Barnett Zehnwirth
,
ChainLadder
,
IBNR
,
Insurance
,
linear model
,
log-incremental
,
R
,
reserving
,
Risk
,
Tutorials
Reserving based on log-incremental payments in R, part II
Following on from last week's post I will continue to go through the paper Regression models based on log-incremental payments by Stavros Christofides [1]. In the previous post I introduced the model from the first 15 pages up to section F. Today I will progress with sections G to K which illustrate the model with a more realistic incremental claims payments triangle from a UK Motor Non-Comprehensive account:
Before I plot the data, I will transform the triangle into a data frame and add extra columns:
Based on those observations Christofides suggests two models; the first one will have a unique level for each origin year and a unique level for the zero development period. The parameters for development periods 1 to 6 are assumed to follow a linear relationship with the same slope \(s\):
\begin{align}
\ln(P_{ij}) & = Y_{ij} = a_i + d_j + \epsilon_{ij}
&\mbox{for } i,\,j \mbox{ from } 0 \mbox{ to } 6\\
\mbox{where } d_0 &= d,\quad d_j = s \cdot j
&\mbox{for } j > 0
\end{align}and \(\epsilon_{ij} \sim N(0, \sigma^2)\). The second model will be a reduced version of the above with only two levels for the origin years 5 and 6. Hence, I add four more columns to my data frame:
# Page D5.17
tri <- t(matrix(
c(3511, 3215, 2266, 1712, 1059, 587, 340,
4001, 3702, 2278, 1180, 956, 629, NA,
4355, 3932, 1946, 1522, 1238, NA, NA,
4295, 3455, 2023, 1320, NA, NA, NA,
4150, 3747, 2320, NA, NA, NA, NA,
5102, 4548, NA, NA, NA, NA, NA,
6283, NA, NA, NA, NA, NA, NA), nc=7))
The rows show origin period data, e.g. accident years, underwriting years or years of account and the columns present the development periods or lags. The triangle appears to be fairly well behaved. The last two years in rows 6 and 7 appear to be slightly higher than rows 2 to 5 and the values in row 1 are lower in comparison to the later years. The last payment of £1,238 in the third row stands out a bit as well. Before I plot the data, I will transform the triangle into a data frame and add extra columns:
m <- dim(tri)[1]; n <- dim(tri)[2]
dat <- data.frame(
origin=rep(0:(m-1), n),
dev=rep(0:(n-1), each=m),
value=as.vector(tri))
## Add dimensions as factors
dat <- with(dat, data.frame(origin, dev, cal=origin+dev,
value, logvalue=log(value),
originf=factor(origin),
devf=as.factor(dev),
calf=as.factor(origin+dev)))
I am particularly interested in the decay of claims payments in the development year direction for each origin year on the original and log-scale. The interaction.plot of the stats package does an excellent job for this:op <- par(mfrow=c(2,1), mar=c(4,4,2,2))
with(dat, interaction.plot(x.factor=dev, trace.factor=origin,
response=value))
points(dat$devf, dat$value, pch=16, cex=0.5)
with(dat, interaction.plot(x.factor=dev, trace.factor=origin,
response=logvalue))
points(dat$devf, dat$logvalue, pch=16, cex=0.5)
par(op)
Indeed the origin years 1 to 4 (rows 2 to 5) look quite similar and the decay of claims in development year direction appears to be linear on a log-scale from development year 1 onwards.Based on those observations Christofides suggests two models; the first one will have a unique level for each origin year and a unique level for the zero development period. The parameters for development periods 1 to 6 are assumed to follow a linear relationship with the same slope \(s\):
\begin{align}
\ln(P_{ij}) & = Y_{ij} = a_i + d_j + \epsilon_{ij}
&\mbox{for } i,\,j \mbox{ from } 0 \mbox{ to } 6\\
\mbox{where } d_0 &= d,\quad d_j = s \cdot j
&\mbox{for } j > 0
\end{align}and \(\epsilon_{ij} \sim N(0, \sigma^2)\). The second model will be a reduced version of the above with only two levels for the origin years 5 and 6. Hence, I add four more columns to my data frame:
15 Jan 2013
07:46
Actuarial
,
Barnett Zehnwirth
,
ChainLadder
,
IBNR
,
Insurance
,
linear model
,
log-incremental
,
R
,
reserving
,
Risk
,
Tutorials
Reserving based on log-incremental payments in R, part I
A recent post on the PirateGrunt blog on claims reserving inspired me to look into the paper Regression models based on log-incremental payments by Stavros Christofides [1], published as part of the Claims Reserving Manual (Version 2) of the Institute of Actuaries.
The paper is available together with a spread sheet model, illustrating the calculations. It is very much based on ideas by Barnett and Zehnwirth, see [2] for a reference. However, doing statistical analysis in a spread sheet programme is often cumbersome. I will go through the first 15 pages of Christofides' paper today and illustrate how the model can be implemented in R.
Let's start with the example data of an incremental claims triangle:
Christofides model assumes the following structure for the incremental paid claims \(P_{ij}\):
\begin{align}
\ln(P_{ij}) & = Y_{ij} = a_i + b_j + \epsilon_{ij}
\end{align}where i and j go from 0 to 3, \(b_0=0\) and \(\epsilon_{ij} \sim N(0, \sigma^2)\). Unlike the basic chain-ladder method, this is a stochastic model that allows me to test my assumptions and calculate various statistics, e.g. standards errors of my predictions.
The paper is available together with a spread sheet model, illustrating the calculations. It is very much based on ideas by Barnett and Zehnwirth, see [2] for a reference. However, doing statistical analysis in a spread sheet programme is often cumbersome. I will go through the first 15 pages of Christofides' paper today and illustrate how the model can be implemented in R.
Let's start with the example data of an incremental claims triangle:
## Page D5.4
tri <- t(matrix(
c(11073, 6427, 1839, 766,
14799, 9357, 2344, NA,
15636, 10523, NA, NA,
16913, NA, NA, NA),
nc=4, dimnames=list(origin=0:3, dev=0:3)))
The above triangle shows incremental claims payments for four origin (accident) years over time (development years). It is the aim to predict the bottom right triangle of future claims payments, assuming no further claims after four development years.Christofides model assumes the following structure for the incremental paid claims \(P_{ij}\):
\begin{align}
\ln(P_{ij}) & = Y_{ij} = a_i + b_j + \epsilon_{ij}
\end{align}where i and j go from 0 to 3, \(b_0=0\) and \(\epsilon_{ij} \sim N(0, \sigma^2)\). Unlike the basic chain-ladder method, this is a stochastic model that allows me to test my assumptions and calculate various statistics, e.g. standards errors of my predictions.
8 Jan 2013
07:58
Actuarial
,
Barnett Zehnwirth
,
ChainLadder
,
IBNR
,
Insurance
,
linear model
,
log-incremental
,
R
,
reserving
,
Risk
,
Tutorials
Claims reserving in R: ChainLadder 0.1.5-4 released
Last week we released version 0.1.5-4 of the
The chain-ladder method which is a popular method in the insurance industry to forecast future claims payments gave the package its name. However, the
Since we published version 0.1.5-2 in March 2012 additional functionality has been added to the package, see the change log, but in particular the vignette has come a long way.
Many thanks to my co-authors Dan Murphy and Wayne Zhang.
ChainLadder package on CRAN. The R package provides methods which are typically used in insurance claims reserving. If you are new to R or insurance check out my recent talk on Using R in Insurance.The chain-ladder method which is a popular method in the insurance industry to forecast future claims payments gave the package its name. However, the
ChainLadder package has many other reserving methods and models implemented as well, such as the bootstrap model demonstrated below. It is a great starting point to learn more about stochastic reserving.Since we published version 0.1.5-2 in March 2012 additional functionality has been added to the package, see the change log, but in particular the vignette has come a long way.
Many thanks to my co-authors Dan Murphy and Wayne Zhang.
20 Nov 2012
19:32
Actuarial
,
ChainLadder
,
Insurance
,
News
,
R
,
reserving
,
stochastic reserving
,
vignette
Stochastic reserving with R: ChainLadder 0.1.5-1 released
Today we published version 0.1.5-1 of the
The package started out of presentations given at the Stochastic Reserving Seminar at the Institute of Actuaries in 2007, 2008 and 2010, followed by talks at CAS meetings in 2008 and 2010.
Initially the package came with implementations of the Mack-, Munich- and Bootstrap Chain-Ladder methods. Since version 0.1.3-3 it also provides general multivariate chain ladder models by Wayne Zhang. Version 0.1.4-0 introduced new functions on loss development factor fitting and Cape Cod by Daniel Murphy following a paper by David Clark. Version 0.1.5-0 has added loss reserving models within the generalized linear model framework following a paper by England P. and Verrall R. (1999) implemented by Wayne Zhang.
For more details see the project web site: http://code.google.com/p/chainladder/ and an early blog entry about R in the insurance industry.
Changes in version 0.1.5-1:
ChainLadder package for R. It provides methods which are typically used in insurance claims reserving to forecast future claims payments.![]() |
| Claims development and chain-ladder forecast of the RAA data set using the Mack method |
Initially the package came with implementations of the Mack-, Munich- and Bootstrap Chain-Ladder methods. Since version 0.1.3-3 it also provides general multivariate chain ladder models by Wayne Zhang. Version 0.1.4-0 introduced new functions on loss development factor fitting and Cape Cod by Daniel Murphy following a paper by David Clark. Version 0.1.5-0 has added loss reserving models within the generalized linear model framework following a paper by England P. and Verrall R. (1999) implemented by Wayne Zhang.
For more details see the project web site: http://code.google.com/p/chainladder/ and an early blog entry about R in the insurance industry.
Changes in version 0.1.5-1:
- Internal changes to
plot.MackChainLadderto pass new checks introduced by R 2.14.0.
- Commented out unnecessary creation of 'io' matrix in
ClarkCapeCodfunction. Allows for analysis of very large matrices forCapeCodwithout running out of RAM. 'io' matrix is an integral part ofClarkLDF, and so remains in that function.
-
plot.clarkmethod
- Removed "conclusion" stated in
QQplotof clark methods.
- Restore 'par' settings upon exit
- Slight change to the title
- Removed "conclusion" stated in
- Reduced the minimum 'theta' boundary for weibull growth function
- Added warnings to
as.triangleif origin or dev. period are not numeric
Here is a little example using the googleVis package to display the RAA claims development triangle:
library(ChainLadder) library(googleVis) data(RAA) # example data set of the ChainLadder package class(RAA) <- "matrix" # change the class from triangle to matrix df <- as.data.frame(t(RAA)) # coerce triangle into a data.frame names(df) <- 1981 : 1990 df$dev <- 1:10 plot(gvisLineChart(df, "dev", options=list(gvis.editor="Edit me!", hAxis.title="dev. period")))
12 Nov 2011
21:55
ChainLadder
,
Insurance
,
R
,
reserving
,
Soapbox



















