CAS Exam 7 Study Notes

9.3 Bootstrap Model

Benefits of the bootstrap model:

Allows us to estimate the distribution with very little data
We don’t have to make any assumptions about the underlying distribution (non-parametric)
- The ODP part is the error distribution

ODP bootstrap models:

Incremental claims directly as the response
With the same linear predictor as Kremer (1982)
Using a GLM with log-link function and an ODP Poisson error
Where a specific form of this model is identical to the volume weighted chain ladder
Using bootstrap (sampling residuals with replacement) to estimate the distribution of point estimates

(Instead of simulating from a multivariate normal for a GLM)

9.3.1 GLM Parameters

Mean and variance for each \(q(w,d)\) in the triangle (per table 9.1)

9.3.1.1 Mean and log-mean for \(q(w,d)\)

\[\begin{equation} \mathrm{E}[q(w,d)] = m_{wd} = \exp \left [\alpha_w + \sum_{i=2}^d \beta_i \right] \:\: : \: \: w \in [2, n] \tag{9.1} \end{equation}\] \[\begin{equation} \ln \left( \mathrm{E}[q(w,d)] \right) = \ln(m_{w,d}) = \eta_{w,d} = \alpha_w + \sum_{i=2}^d \beta_i \:\: : \: \: w \in [2, n] \tag{9.2} \end{equation}\]

Remark.

\(\alpha\)’s are the individual level parameters
\(\beta\)’s adjust for the development trends after the first development period
- We don’t use \(\beta_1\) which effectively means \(\beta_1 = 0\)
\(\alpha_i\) and \(\beta_j\) are selected to minimize error between \(\ln(actual) - \ln(forecast)\)

Equivalence for using Venter notation:

\(h(w) = e^{\alpha}\)
\(f(d) = e^{\sum \beta}\)

9.3.1.2 Variance for \(q(w,d)\)

\[\begin{equation} \mathrm{Var}[q(w,d)] = \phi m_{wd}^z \tag{9.3} \end{equation}\]

\(\phi\): Dispersion factor
- Scale factor estimated as part of the fitting procedure while setting the variance proportional to the mean
- Estimated from the residuals
\(z\): Error distribution
- Paper focus on \(z = 1\) for Over Dispersed Poisson (ODP)
- Specifies the whole mean-variance relationship (not only the first 2 moments)

Table 9.2: Distribution with corresponding \(z\)
\(z\)	Distribution
0	Normal
1	Poisson
2	Gamma
3	Inverse Gaussian

9.3.2 Fitted Triangle

We can fit the \(\alpha\)’s and \(\beta\)’s defined above using the GLM framework, or the simplified GLM method

9.3.2.1 Parameterize with GLM Framework

Start with a \(3 \times 3\) incremental triangle

Table 9.3: \(3\times 3\) incremental triangle:
w/d	1	2	3
1	\(q(1,1)\)	\(q(1,2)\)	\(q(1,3)\)
2	\(q(2,1)\)	\(q(2,2)\)
3	\(q(3,1)\)

Log transform of the triangle

Table 9.4: \(3\times 3\) log incremental triangle:
w/d	1	2	3
1	\(\ln[q(1,1)]\)	\(\ln[q(1,2)]\)	\(\ln[q(1,3)]\)
2	\(\ln[q(2,1)]\)	\(\ln[q(2,2)]\)
3	\(\ln[q(3,1)]\)

Create a system of equations based on equation (9.2)

\[\begin{equation} \begin{split} \ln[q(1,1)] &= 1\alpha_1 + 0\alpha_2 + 0\alpha_3 + 0\beta_2 + 0\beta_3 \\ \ln[q(2,1)] &= 0\alpha_1 + 1\alpha_2 + 0\alpha_3 + 0\beta_2 + 0\beta_3 \\ \ln[q(3,1)] &= 0\alpha_1 + 0\alpha_2 + 1\alpha_3 + 0\beta_2 + 0\beta_3 \\ \ln[q(1,2)] &= 1\alpha_1 + 0\alpha_2 + 0\alpha_3 + 1\beta_2 + 0\beta_3 \\ \ln[q(2,2)] &= 0\alpha_1 + 1\alpha_2 + 0\alpha_3 + 1\beta_2 + 0\beta_3 \\ \ln[q(1,3)] &= 0\alpha_1 + 0\alpha_2 + 1\alpha_3 + 1\beta_2 + 1\beta_3 \\ \end{split} \tag{9.4} \end{equation}\]

Express the above in matrix form

\[\begin{equation} \begin{array}{ccccc} \mathbf{Y} & = & \mathbf{X} &\times & \mathbf{A} \\ & & \alpha_1 \:\:\: \alpha_2 \:\:\: \alpha_3 \:\:\: \beta_2 \:\:\: \beta_3 & &\\ \begin{bmatrix} ln[q(1,1)] \\ ln[q(2,1)] \\ ln[q(3,1)] \\ ln[q(1,2)] \\ ln[q(2,2)] \\ ln[q(1,3)] \\ \end{bmatrix} & = & \begin{bmatrix} 1 & - & - & - & - \\ - & 1 & - & - & - \\ - & - & 1 & - & - \\ 1 & - & - & 1 & - \\ - & 1 & - & 1 & - \\ 1 & - & - & 1 & 1 \\ \end{bmatrix} & \times & \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ \beta_2 \\ \beta_3 \\ \end{bmatrix} \end{array} \tag{9.5} \end{equation}\]

Remark. \(\mathbf{X}\) is the design matrix that defines the parameters used to estimate the losses in each cell

Use iteratively weighted least squares or MLE¹ to solve for the parameters in the in \(\mathbf{A}\) that minimize the squared difference between \(\mathbf{Y}\) and \(\mathbf{S}\), the solution matrix

\[\begin{equation} \mathbf{S} = \begin{bmatrix} ln[m_{1,1}] \\ ln[m_{2,1}] \\ ln[m_{3,1}] \\ ln[m_{2,1}] \\ ln[m_{2,2}] \\ ln[m_{1,3}] \\ \end{bmatrix} \tag{9.6} \end{equation}\]

After solving the system of equations we will have:

\[\begin{equation} \begin{split} \ln[m_{1,1}] &= \eta_{1,1} &= \alpha_1 \\ \ln[m_{2,1}] &= \eta_{2,1} &= \alpha_2 \\ \ln[m_{3,1}] &= \eta_{3,1} &= \alpha_3 \\ \ln[m_{1,2}] &= \eta_{1,2} &= \alpha_1 + \beta_2\\ \ln[m_{2,2}] &= \eta_{2,2} &= \alpha_2 + \beta_2\\ \ln[m_{1,3}] &= \eta_{1,3} &= \alpha_1 + \beta_2 + \beta_3\\ \end{split} \tag{9.7} \end{equation}\]

The above solution shown as a triangle below

Table 9.5: \(3\times 3\) GLM fitted log incremental triangle:
w/d	1	2	3
1	\(\ln[m_{1,1}]\)	\(\ln[m_{1,2}]\)	\(\ln[m_{1,3}]\)
2	\(\ln[m_{2,1}]\)	\(\ln[m_{2,2}]\)
3	\(\ln[m_{3,1}]\)

Exponentiate the triangle above to get our fitted (or expected) incremental results of the GLM model

Table 9.6: \(3\times 3\) GLM fitted incremental triangle:
w/d	1	2	3
1	\(m_{1,1}\)	\(m_{1,2}\)	\(m_{1,3}\)
2	\(m_{2,1}\)	\(m_{2,2}\)
3	\(m_{3,1}\)

9.3.2.2 Simplified GLM

GLM model = Chainladder w/ volume-weighted averages when:

Variance \(\propto\) Mean
\(\varepsilon(w,d) \sim\) Poisson
A parameter for each row and column (except 1^st column)

Benefits:

Replace GLM fitting with much simpler calculation
LDFs are easier to explain
Still works even when there are negative incremental values

Procedure for fitting incremental triangle:

Select LDFs based on vol-wtd
Start from the last cumulative diagonal and divide backwards by each incremental LDFs to get the cumulative fitted triangle
Subtracting out the cumulative diagonals to get your incremental fitted triangle

9.3.3 Residuals

Unscaled Pearson residuals

\[\begin{equation} \begin{split} r_{w,d} & = & \dfrac{A - E}{\sqrt{\mathrm{Var}(E)}} &\\ & = & \dfrac{q(w,d) - m_{wd}}{\sqrt{m^z_{wd}}} &\\ & = & \dfrac{q(w,d) - m_{wd}}{\sqrt{m_{wd}}} & \:\:\:\: \text{Recall }z = 1\text{ for ODP Poisson}\\ \end{split} \tag{9.8} \end{equation}\]

Mean and variance as defined above
Residual for the right and bottom corners of the triangle are going to be 0

Because a unique parameter is used for those 2 cells
Alternatively we can use Anscombe residual

We prefer Pearson because its calculation is consistent with the scale parameter \(\phi\)

Scaled Pearson residuals (England & Verrall)

\[\begin{equation} r^S_{w,d} = r_{w,d} \times \underbrace{\sqrt{\dfrac{N}{N-p}}}_{f^{DoF}} \tag{9.9} \end{equation}\]

Degrees of freedom adjustment, to effectively allow for over dispersion of the residuals in the sampling process and add process variance to approximate a distribution of possible outcomes
Increase the variability of the pseudo triangle

Standardized residuals (Pinheiro et al.)

\[\begin{equation} r^H_{w,d} = r_{w,d} \times \underbrace{\sqrt{\dfrac{1}{1-H_{i,i}}}}_{f^H_{w,d}} \tag{9.10} \end{equation}\] \[\begin{equation} \mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{WX})^{-1}\mathbf{X}^T\mathbf{W} \tag{9.11} \end{equation}\] \[\begin{equation} \mathbf{W} = \begin{bmatrix} m_{1,1} & 0 & \cdots & 0 \\ 0 & m_{2,1} & 0 & 0 \\ \vdots & 0 & \ddots & \vdots\\ 0 & 0 & \cdots & m_{1,n}\\ \end{bmatrix} \tag{9.12} \end{equation}\]

Hat matrix adjustment factor \(f^H_{w,d}\) is based on the diagonal on the hat matrix \(\mathbf{H}\)

(Going down the column of the triangle from left to right)
\(\mathbf{W}\) is a \(2n \times 2n\) matrix
\(\mathbf{X}\) is the design matrix from (9.5)
Benefits:
1. \(f^H_{w,d}\) account for the exclusion of zero-value residuals
  - Or the zero-value residuals will have some variance but we just don’t know what it is yet so we should sample from the remaining residuals but not the zeros
2. \(f^H_{w,d}\) is an improvement on \(f^{DoF}\)

9.3.4 Dispersion Factor

Dispersion factor

\[\begin{equation} \phi = \dfrac{\sum r_{wd}^2}{N-p} \tag{9.13} \end{equation}\]

\[N = \dfrac{n (n+1)}{2}\]

\[p = 2n-1\]

\(N =\) # of data points (including first column unlike Ventor)
- \(N\) can be less than indicated above if the tail incremental developments are all 0’s
\(p =\) # of parameters
- One for each row, one for each column minus first column
- \(p\) can be less than \(2n-1\) if the later incremental values are all 0’s and therefore not needed for fitting
This calculation is similar to Clark’s \(\sigma^2\) (6.4)

Alternate method for \(\phi\)

\[\phi \sim \phi^H = \dfrac{\sum (r^H_{w,d})^2}{N}\]

We can still use the same dispersion factor even with the scaled and standardized residuals, this just give us another method to estimate \(\phi\)

You can also use other methods such as orthogonal decomposition or Newton-Raphson to solve for the parameters↩