4.4 Chain Ladder Assumptions Test
Tests on the various Chain Ladder assumptions
4.4.1 Intercept
Test for assumption 1 (prop. 4.1)
Test Procedure
Plot the losses at adjacent ages
- Do this for every age \(k\) vs age \(k+1\)
Results Interpretation
We expect to see the line of best fit goes through the origin if the chain ladder assumption holds
4.4.2 Residuals
Test for assumption 3 (prop. 4.3)
Test Procedure
For each age \(k\), plot the \(c_{i,k}\) with the residuals \(\varepsilon_{i,k+1}\)
x-axis is the \(c_{i,k}\) and y-axis is \(\varepsilon_{i, k+1}\)
These are weighted residuals (Clark is normalized residual and bootstrap is pearson residual)
Remark.
We can take out the \(\alpha^2_k\) term since it’s constant for the same \(k\)
e.g. \(\varepsilon = \dfrac{c_{i,k+1} - c_{i,k} \: \hat{f_k}}{\sqrt{c_{i,k}}}\) for weighted average assumption
For residuals @ \(k\), you need LDFs from \(k-1\) to \(k\)
- Note that the results would change depending on the unit (e.g. dollar, thousand, etc) but shouldn’t affect your conclusion
You can calculate the different weighted LDFs with the table features on TI-30XS
-
y terms can be the LDFs and x is the weight (ci, k2 or ci, k depending on the assumption)
-
Then the LDF will be \(\dfrac{\sum x y}{\sum x}\)
Results Interpretation
Residuals should vary randomly around zero across \(c_{i,k}\)
Test can be used to test the various variance assumptions by calculating the \(\varepsilon\) differently (See Table 4.1)
- If passed \(\Rightarrow\) expected losses are linear w.r.t. cumulative losses paid to date
4.4.3 Calendar Year Test
Test for assumption 2 (prop. 4.2)
Test Procdeure
Rank the LDFs in each column (1 = lowest)
Label them \(S\) (small) and \(L\) (large) and the median is discarded
For each diagonal \(d\) with at least 2 elements:
- Calculate \(\mathrm{E}[z_n]\) and \(\mathrm{Var}(z_n)\) for each diagonal \(d\)
Remark.
\(n =\) # of elements in each diagonal excluding the throw away value
\(c_n = {n - 1 \choose m}\frac{n}{2^n}\)
\(m = \mathrm{floor}\left[ \dfrac{n-1}{2} \right]\)
- \(z \sim\) Normal
n | \(\mathrm{E}[z_n]\) | \(\mathrm{Var}(z_n)\) |
---|---|---|
2 | 0.5 | 0.25 |
3 | 0.75 | 0.188 |
4 | 1.25 | 0.438 |
5 | 1.563 | 0.37 |
6 | 2.062 | 0.62 |
- See if the observed \(Z\) is in the CI
- Test 95% CI: \(\mathrm{E}[Z_n] \pm 2 \times \sqrt{\mathrm{Var}(Z_n)}\)
Results Interpretation
If the observed \(Z\) is outside the CI range \(\Rightarrow\) There is calendar year effects and assumption (2) is violated
4.4.4 Correlation of Adjacent LDFs
Test assumption (1) (prop. 4.1)
Measures correlation between each column and the adjacent column
We want to test if there is a correlation among columns for the triangle as a whole
\(\therefore\) We define one test statistics for the whole triangle
Use rank correlation (e.g. Spearman’s correlation coefficient \(T\)) instead of value correlation (e.g. Pearson correlation)
Because LDFs down the column for a given age \(k\) have different variance
See Venter for his method too
Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables
We are testing for independence
- Which is more strict than just testing for 0 correlation
Threshold use is relatively low, at 50%, as an indicator that we need to investigate further
Reason to consider the correlation of a triangle as whole instead of between pairs of columns
More important to know whether correlations globablly prevail than to find a samll part of the triangle with correlation
At 10% significance 10% of the pairs will show up as significant just by random (see more on Venter)
Avoid an accumulation of error probabilities
Test Procedure
- Calculate Spearman’s correlation coefficient \(T_k\) for each pair of adjacent LDFs
Remark.
Rank is for each column \(k\) from low to high (i.e. lowest is 1)
\(n_k =\) number of pairs
For a 10 x 10 triangle, \(k \in [2 , 8]\)
Only 9 LDFs so 8 pairs
And we don’t use the column with only 1 row
- \(k\) starts at 2 by convention
- Calculate Spearman’s correlation coefficient \(T\) for the whole triangle
Remark.
Formula is the weighted average of the \(T_k\)’s, weight = \(n_k - 1\)
\(I =\) size of triangle
- Formula gives more weight to \(T_k\) with more data
- Compare \(T\) with CI based on distribution
Remark.
Assume \(T \sim Normal(0, \sqrt{\mathrm{Var}(T)})\)
- Use \(Z_{75\%} = 0.67\) for range of [25%, 75%]
Results Interpretation
If the \(T\) is in the CI \(\Rightarrow\) Do not reject the \(H_0\) of uncorrelated LDFs