CAS Exam 7 Study Notes

4.4 Chain Ladder Assumptions Test

Tests on the various Chain Ladder assumptions

4.4.1 Intercept

Test for assumption 1 (prop. 4.1)

Test Procedure

Plot the losses at adjacent ages

Do this for every age \(k\) vs age \(k+1\)

Results Interpretation

We expect to see the line of best fit goes through the origin if the chain ladder assumption holds

4.4.2 Residuals

Test for assumption 3 (prop. 4.3)

Test Procedure

For each age \(k\), plot the \(c_{i,k}\) with the residuals \(\varepsilon_{i,k+1}\)

x-axis is the \(c_{i,k}\) and y-axis is \(\varepsilon_{i, k+1}\)
These are weighted residuals (Clark is normalized residual and bootstrap is pearson residual)

\[\begin{equation} \varepsilon_{i,k+1} = \dfrac{c_{i,k+1} - c_{i,k} \: \hat{f_k}}{\sqrt{\mathrm{Var}(c_{i,k})}} \tag{4.10} \end{equation}\]

Remark.

We can take out the \(\alpha^2_k\) term since it’s constant for the same \(k\)

e.g. \(\varepsilon = \dfrac{c_{i,k+1} - c_{i,k} \: \hat{f_k}}{\sqrt{c_{i,k}}}\) for weighted average assumption
For residuals @ \(k\), you need LDFs from \(k-1\) to \(k\)
Note that the results would change depending on the unit (e.g. dollar, thousand, etc) but shouldn’t affect your conclusion

You can calculate the different weighted LDFs with the table features on TI-30XS

y terms can be the LDFs and x is the weight (c_i, k² or c_i, k depending on the assumption)
Then the LDF will be \(\dfrac{\sum x y}{\sum x}\)

Results Interpretation

Residuals should vary randomly around zero across \(c_{i,k}\)

Test can be used to test the various variance assumptions by calculating the \(\varepsilon\) differently (See Table 4.1)

If passed \(\Rightarrow\) expected losses are linear w.r.t. cumulative losses paid to date

4.4.3 Calendar Year Test

Test for assumption 2 (prop. 4.2)

Test Procdeure

Rank the LDFs in each column (1 = lowest)
Label them \(S\) (small) and \(L\) (large) and the median is discarded
For each diagonal \(d\) with at least 2 elements:

\[\begin{equation} z^d = \mathrm{min}(\text{# of }S, \text{# of }L) \tag{4.11} \end{equation}\]

Calculate \(\mathrm{E}[z_n]\) and \(\mathrm{Var}(z_n)\) for each diagonal \(d\)

\[\begin{equation} \mathrm{E}[z_n] = \dfrac{n}{2} - c_n \tag{4.12} \end{equation}\] \[\begin{equation} \mathrm{Var}(z_n) = \dfrac{n(n-1)}{4} - c_n (n-1) + \mathrm{E}[z_n] - \mathrm{E}[z_n]^2 \tag{4.13} \end{equation}\]

Remark.

\(n =\) # of elements in each diagonal excluding the throw away value
\(c_n = {n - 1 \choose m}\frac{n}{2^n}\)
\(m = \mathrm{floor}\left[ \dfrac{n-1}{2} \right]\)
\(z \sim\) Normal

Table 4.2: \(\mathrm{E}[z_n]\) and \(\mathrm{Var}(z_n)\) up to \(n=6\)
n	\(\mathrm{E}[z_n]\)	\(\mathrm{Var}(z_n)\)
2	0.5	0.25
3	0.75	0.188
4	1.25	0.438
5	1.563	0.37
6	2.062	0.62

See if the observed \(Z\) is in the CI

\[\begin{equation*} Z = \sum_{d} z^d \end{equation*}\] \[\begin{equation*} \mathrm{E}[Z_n] = \sum_{d} \mathrm{E}[z_n^d] \end{equation*}\] \[\begin{equation*} \mathrm{Var}[Z_n] = \sum_{d} \mathrm{Var}[z_n^d] \end{equation*}\]

Remark. Since \(z \sim Normal\), can sum the mean and variance by assuming independence

Test 95% CI: \(\mathrm{E}[Z_n] \pm 2 \times \sqrt{\mathrm{Var}(Z_n)}\)

Results Interpretation

If the observed \(Z\) is outside the CI range \(\Rightarrow\) There is calendar year effects and assumption (2) is violated

4.4.4 Correlation of Adjacent LDFs

Test assumption (1) (prop. 4.1)

Measures correlation between each column and the adjacent column
We want to test if there is a correlation among columns for the triangle as a whole

\(\therefore\) We define one test statistics for the whole triangle

Use rank correlation (e.g. Spearman’s correlation coefficient \(T\)) instead of value correlation (e.g. Pearson correlation)

Because LDFs down the column for a given age \(k\) have different variance
See Venter for his method too
Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables

We are testing for independence

Which is more strict than just testing for 0 correlation

Threshold use is relatively low, at 50%, as an indicator that we need to investigate further

Reason to consider the correlation of a triangle as whole instead of between pairs of columns

More important to know whether correlations globablly prevail than to find a samll part of the triangle with correlation
At 10% significance 10% of the pairs will show up as significant just by random (see more on Venter)
Avoid an accumulation of error probabilities

Test Procedure

Calculate Spearman’s correlation coefficient \(T_k\) for each pair of adjacent LDFs

\[\begin{equation} S_k = \sum \limits_{i} \Big \{ rank(f_{i,k-1}) - rank(f_{i,k}) \Big \}^2 \tag{4.14} \end{equation}\] \[\begin{equation} T_k = 1 - \dfrac{S_k}{n_k(n_k^2-1)/6} \tag{4.15} \end{equation}\]

Remark.

Rank is for each column \(k\) from low to high (i.e. lowest is 1)
\(n_k =\) number of pairs
For a 10 x 10 triangle, \(k \in [2 , 8]\)
- Only 9 LDFs so 8 pairs
- And we don’t use the column with only 1 row
\(k\) starts at 2 by convention

Calculate Spearman’s correlation coefficient \(T\) for the whole triangle

\[\begin{equation} T = \dfrac{\sum T_k (n_k - 1)}{\sum (n_k-1)} = \dfrac{\sum_k (I - k -1)T_k}{\sum_k I - k -1} \tag{4.16} \end{equation}\]

Remark.

Formula is the weighted average of the \(T_k\)’s, weight = \(n_k - 1\)
\(I =\) size of triangle
Formula gives more weight to \(T_k\) with more data

Compare \(T\) with CI based on distribution

\[\begin{equation} \begin{array}{c} CI = \mathrm{E}[T] \pm Z \sqrt{\mathrm{Var}(T)} \\ \mathrm{E}[T] = 0 \\ \mathrm{Var}[T] = \dfrac{1}{(I-2)(I-3)/2} \\ \end{array} \tag{4.17} \end{equation}\]

Remark.

Assume \(T \sim Normal(0, \sqrt{\mathrm{Var}(T)})\)
Use \(Z_{75\%} = 0.67\) for range of [25%, 75%]

Results Interpretation

If the \(T\) is in the CI \(\Rightarrow\) Do not reject the \(H_0\) of uncorrelated LDFs