9.8 Practical Issues
Practical issues we might run into with ODP bootstrap
9.8.1 Negative Incremental Values
GLM doesn’t work with negative incremental values because of \(\ln[q(w,d)]\)
Need to work around this in:
Model fitting (e.g. Step 0 and 1 of the Bootstrap process)
Simulating for process variance with negative means (e.g. Step 4 of the Bootstrap process)
Also additional work around on extreme outcomes from negative values
9.8.1.1 Model Fitting
Method 1: Use \(-ln(abs\{q(w,d)\})\)
\[\begin{equation} Cell_{w,d} = \begin{cases} \ln[q(w,d)] & \text{if } q(w,d) > 0 \\ 0 & \text{if } q(w,d) = 0 \\ -\ln[abs \{ q(w,d) \}] & \text{if } q(w,d) < 0 \\ \end{cases} \tag{9.16} \end{equation}\]Remark. Doesn’t work when the column sum to a negative value
- This is done when setting the design matrix (9.5)
Method 2: Subtract a negative constant \(\Psi\)
\[\begin{equation} q^+(w,d) = q(w,d) - \Psi \\ \ln[q^+(w,d)] \text{ for all } Cell_{w,d} \tag{9.17} \end{equation}\]Pick \(\Psi =\) largest negative value in the column
Apply (9.17) before solving the GLM system of equations (e.g. (9.2) and (9.4))
Then adjust the fitted values by adding back \(\Phi\) to reduce each fitted incremental value
Can use this method combined with method 1 to take care of the extra large negative ones
Need to make use the absolute value for the residual and re-sampling formula, modify (9.8) and (9.14) with below:
Method 3: Use simplified GLM
Use ODP bootstrap (i.e. Chainladder with volume weighted average LDFs)
This will yield different estimate than using the GLM framework with adjustment 1 or 2
9.8.1.2 Simulating Negative Values
From above, we might have the fitted \(m_{wd}\) that are negative, which will be an issue when used in Step 4 of the bootstrap simulation, when we need to model the process variance with \(Gamma(m_{wd},\phi m_{wd})\)
- Since \(Gamma\) only takes positive parameters
Adjustment to the Gamma Distribution with negative \(m_{wd}\)
\[\begin{equation} Gamma(abs\{m_{wd}\}, \phi abs\{m_{wd}\}) + 2m_{wd} \tag{9.21} \end{equation}\]This will maintain the right skew of Gamma while having the mean of \(m_{wd}\)
Alternatively if we use \(-Gamma(abs\{m_{wd}\}, \phi abs\{m_{wd}\})\) it’ll flip the curve to skew left
9.8.1.3 Extreme Outcomes from Negative Values
Column with negative mean in the early ages can results in vary large LDFs (and lead to simulated outcomes that are 1,000 times greater than our mean)
Negative mean causes one column of cumulative values to sum close to 0 and the next to sum to a much larger number resulting in extremely large LDF and there for projection that are extremely large
Need to address this as it’ll throw off the mean even if you don’t care about the high percentiles
3 options to address this:
Remove the extreme iterations
Beware of understating the the likelihood of extreme outcomes
Recalibrate the Model
First need to identify the source of the negative losses
Review data used and parameter selection
e.g. remove the AYs that might not represent current behavior
e.g. if due to S&S then you can just model them separately and then correlate them during simulation
Limit Incremental Losses to 0
Either with the simulated mean (Step 2) or the process var step (Step 4)
Replace with negatives with 0s
Can just do it in certain columns
9.8.2 Non-Zero Sum of Residuals
Residuals are supposed to be iid with mean zero and constant variance
\(\therefore\) Sum of our residuals from the triangle should be 0
- Not necessarily the case since this is just a sample
Consequence: Simulated outcomes will be higher than the mean if sum of residuals are positive (and vice versa)
2 options to address this:
Keep it if we believe this to be characteristics of the data set
Add a constant to each non-zero residual so that it sums to 0
Then sample from the adjusted residuals
If residuals are significantly different from zero then the fit of the model should be questioned
9.8.3 Using L-year Weighted Average
Select LDFs based on the latest \(L\) years
GLM Bootstrap
Only use \(L+1\) diagonals of data to get \(L\) diagonals of LDFs
Excluded diagonals are given zero weight and we’ll have less CY trend parameter (if we’re using it)
In the simulation we’ll only sample residuals for the trapezoid used to parameterize the model
(since that’s all we’ll need to estimate parameters)
Simplified GLM
Get L-year weighted average LDFs
Will only have residuals (to sample from) for the most recent L + 1 diagonals
In the simulation we’ll create the entire resampled triangle
(Since we need the cumulative losses for each row)
For projection using the resampled triangle we’ll still only use the L-year average LDFs
The 2 methods will results in different results
GLM Bootstrap: Models the incremental losses in the trapezoid
Simplified GLM: Models the same losses but in relation to the cumulative losses, which include the non-modeled losses in the diagonals excluded
9.8.4 Missing Value
ODP Bootstrap:
Missing data impact:
LDFs
Fitted triangle (if missing value lies on the most recent diagonal)
Residuals
Degree of freedom
Solutions:
Impute from surrounding values
Modify LDFs to exclude missing value
Similar to the L-year weighted average:
Missing value will be resampled so the cumulative losses can be calculated
Projection from the resampled triangle will exclude the missing cell for resampled LDF selection
GLM Bootstrap
Impact on the is limited, we’ll just have less observations
9.8.5 Outliers
Remove outliers if they are not representative of the variability of the losses, below are the options:
Remove the entire row (easy if it’s the 1st row of the triangle)
Remove the values and treat them as missing values
Not use the residual but do create a sampled value in that cell
Significant number of outliers might indicate bad model fit
GLM Bootstrap
Pick new parameters (grouping parameters)
Change the error term distribution from \(z=1\)
ODP Bootstrap
Use L-year weighted average
Heteroscedasticity may exist
- See adjustment next sub section and diagnostics
Since we dont’ make a distribution assumption, the number of outliers could mean the data is quite skewed and it’s appropriate that is showing up in the simulation
9.8.6 Heteroskedasticity
Issue of non constant variance
ODP bootstrap assumes residuals are \(iid\) with constant variance
No longer possible to sample the residuals from the whole triangle with heteroskedasticity
GLM Bootstrap has the additional flexibility of choosing parameters to alleviate heteroscedasticity
For ODP Bootstrap: 3 ways to deal with heteroscedasticity below
- They also work for GLM Bootstrap
9.8.6.1 Stratified Sampling
Stratified Sampling
Split the triangle into groups with similar variance
Only sample residuals from the same group
Cons
- Each group may not be that large, which limits the amount of variability in the possible outcomes
9.8.6.2 Hetero-Adjustment to the Residuals
Calculate a hetero-adjustment factor to scale the residuals to the same level:
Group the residuals with similar then calculate the \(\sigma\) of the residuals in each group \(i\)
Hetero-adjustment factor: \(h^i\)
i.e. The largest \(\sigma\) \(\div\) each group’s \(\sigma\)
Scale up the residuals:
Residual (9.8) \(\times\) Hat Matrix Factor (9.10) \(\times\) Hetero Factor (9.22)
\[\begin{equation} r_{wd}^{iH} = r_{wd} \times f_{wd}^H \times h^i \tag{9.23} \end{equation}\]- \(h^i\) here is based on the group we draw from
Need to divide the sampled residual by \(h^i\) to reflect the variability of group \(i\)
\[\begin{equation} q^{i*}(w,d) = m_{wd} + \dfrac{r^{i*}}{h^i}\sqrt{m_{wd}} \tag{9.24} \end{equation}\]- \(h^i\) here is based on the group we’re simulating for
Adjust the variance for the process variance step in the simulation
9.8.6.3 Non-constant Scale Parameters
Adjust the dispersion factor \(\phi\) as well as the residuals (similar to above)
- Calculate the hetero adjustment factor \(h_i\) using formula (9.27) below:
Perform step 3 and 4 from the hetero adjustment method above
Calculate \(\phi_i\) for each homogenious residual group \(i\) (\(n_i\) = number of residuals in group):
- Use \(\phi_i\) for the process variance step
9.8.7 Heteroecthesious Data
ODP bootstrap requirements:
Symmetrical shape (annual by annual, quarter by quarterly, etc triangles)
Homoecthesious data (similar exposure)
Heteroecthesious = Accident years have different level of exposures
Here we are focusing on heteroecthesious due to interim evaluation dates:
9.8.7.1 Partial First Development Period
This means the entire first development period is shorter than the rest
e.g. Annual data evaluated as of 6/30 with 1/1-12/31 AYs
We’ll have a triangle with development periods @6, 18, 30, 42, etc
Pearson residuals use the square root of the fitted value to make them all exposure independent (debatable…)
- \(\therefore\) No impact to residuals
Adjustment: Scale down the most recent AY projection to the appropriate exposure period (e.g. half the exposure based on example above), we have 2 options:
Prorate the mean of the incremental cells for the latest AY between step 3 and 4 of the bootstrap process and then proceed to Step 4 for the process variance as usual
Prorate the simulated incremental cells for the latest AY after the process variance step (Step 4)
9.8.7.2 Partial Latest Calendar Period
This is where the latest diagonal is partial diagonal
e.g. Evaluate in between typical data evaluation date
Evaluation @6/30 for a 1/1-12/31 AYs and 12-24-36 triangle
Similar problem as partial first development period + partial data in most recent diagonal
ODP Bootstrap
Select LDF by excluding latest diagonal or prorating the latest diagonal to full year
Adjusted simulation process
Calculate sampled triangle as usual (diagonal will be of full year)
Calculate full year LDFs and Ultimate as usual
Additional steps:
De-annualize the diagonal
Interpolate the full year LDFs to match the diagonal
Forecast loss
Scale down the latest AY similar to the partial AY adjustment
- No change
GLM Bootstrap
- Should be something similar
9.8.8 Exposure Adjustment
Adjustment for when exposure changed dramatically over the years (e.g. rapid growth or run off)
ODP Bootstrap
Divide losses by exposure (model loss cost)
Need to multiply the simulated results by the exposure (after the process variance step)
GLM Bootstrap
Adjust losses by exposure similar to above
Fit to the exposure adjusted losses should be exposure weighted
(i.e. exposure adjusted losses with higher exposure are assumed to have lower variance, see Anderson et al. (2007))
This will need fewer AY parameters since the exposure adjustment should capture a lot of the difference between AYs
9.8.9 Parametric Bootstrapping
ODP Bootstrap
See CAS Tail Factor Working Party Report (2013)
Add tail factor to the algorithm by assuming the factor follows a distribution (other considerations such as process variance, hetero-adj can all be extended to include the tail factors)
Should be an extrapolation of the incremental tail factors (instead of a single tail factor to ultimate)
Tail factors typically have \(\sigma <\) 50% of the tail factor - 1
(But should compare to the \(\sigma\) of the AtA factors leading up to the tail in both the actual and simulated data)
GLM Bootstrap
- Continue to use the last \(\beta_d\) to estimate the tail by continuing to apply it (similarly for CY parameter)
9.8.10 Fitting a Distribution to ODP Bootstrap Residuals
Data points from triangle may not be representative of the underlying distribution
- Whether the most extreme observation is a 1-in-100, 1-in-1000 event
Alternative is to fit a distribution to the residuals and sample from the distribution instead i.e. parametric bootstrapping