Estimating Parameters

In this vignette we elaborate on the abilities and workings of the estimate method.

The default argument to the estimate method are:

model$estimate(
  data, 
  method = "ekf",
  ode.solver = "euler",
  ode.timestep = diff(data$t),
  loss = "quadratic",
  loss_c = NULL,
  control = list(trace=1,iter.max=1e5,eval.max=1e5),
  use.hessian = FALSE,
  laplace.residuals = FALSE,
  unconstrained.optim = FALSE,
  estimate.initial.state = FALSE,
  silent = FALSE
)

Arguments

Argument: method


The method argument decides which estimation/filtering algorithm is used. Currently, the following methods are available:

  1. method='lkf': The Linear Kalman Filter.

  2. method='ekf': The Extended Kalman Filter.

  3. method='laplace': The Laplace Approximation.

There is a difference between the Kalman filtering methods, and the Laplace method in the way they approach likelihood computations, and thus what information the filters produce:

  1. The Kalman filters produce prior and posterior state estimates. The prior estimates are conditioned on observations from the starting time \(t_0\) up til the previous time-point \(t_{k-1}\). The posterior estimates are conditioned also on the observations available at the “current” time-point \(t_{k}\). These are denoted respectively by \[ \text{Prior:} \quad \mathrm{E}\left[ x_{t_k} \mid y_{t_{k-1}}, y_{t_{k-2}},...,y_{t_0} \right] \] \[ \text{Posterior:} \quad \mathrm{E}\left[ x_{t_k} \mid y_{t_{k}}, y_{t_{k-1}},...,y_{t_0} \right] \] The Laplace filter produces smoothed state estimates. This is a state estimate based observations at all time-points \(t_{0}\) to \(t_{N}\) where \(N\) is the last index in the time-series. These are denoted by \[ \text{Smoothed:} \quad \mathrm{E}\left[ x_{t_k} \mid y_{N}, y_{N-1},...,y_{t_{k}},...,y_{t_1}, y_{t_0} \right] \]
  2. The likelihood contributions from the Kalman filters are based on the prior estimates, which are one-step-ahead predictions. This gives rise to independent one-step-ahead residuals, ideal for residual analysis and goodness-of-fit model validation. The Laplace filter does not produce such residuals inherently, and must instead compute these on the side. Residual calculations are disabled by default and determined with the laplace.residuals argument. The computations are very costly and slow. The user is referred to the documentation in TMB::oneStepPredict for further information.

Argument: ode.solver


This argument is used for the Kalman filtering methods only to determine the ODE integrator used to solving the moment differential equations i.e. \[ \dfrac{d\mathrm{E}\left[x_t\right]}{dt} = f(\mathrm{E}\left[x_t\right]) \] \[ \dfrac{d\mathrm{V}\left[x_t\right]}{dt} = \dfrac{df}{dx}\bigg\vert_{(\mathrm{E}\left[x_t\right])}\mathrm{V}\left[x_t\right] + \mathrm{V}\left[x_t\right]\left[\dfrac{df}{dx}\bigg\vert_{(\mathrm{E}\left[x_t\right])}\right]^{T} + g(\mathrm{E}\left[x_t\right])g(\mathrm{E}\left[x_t\right])^{T} \] The ctsmTMB package implements the Explicit Forward-Euler euler and the Explicit 4th Order Runge-Kutta rk4 methods. In addition all of the methods available for deSolve::ode are also available through the use of the RTMBode package, but they are significantly slower than other two solvers, and they are primarily there to add the possibility for implicit and adaptive solvers.

We recommend using the fast simple solvers to begin with when modelling.

Argument: ode.timestep


This argument has two different implications depending on whether Kalman filtering or Laplace filtering is carried out. The method accepts either a single scalar value to be used as the global time-step, or a vector of length diff(data$t) specifying all individual time-steps. The input values are interpolated linearly between time-points if more than one step is taken.

  1. Kalman filters: In this case the argument controls the time-step used for the euler and rk4 ODE solvers.

  2. Laplace filter: In this case the argument controls the number of added intermediate time-points between observations each of which represents an additional state (random effect) at that particular time-point.

If a provided time-step \(\Delta t_{i}\) does not divide the correspond time-difference \(t_{i+1}-t_{i}\) in the data then it is rounded down such that it does. Consider the following example where the time-difference between two observations in the data is 3 seconds, but a time-step of 0.7. This produces a non-integer number of steps i.e.: \[ N_{i} = \dfrac{t_{i+1}-t_{i}}{\Delta t_i} = \dfrac{3}{0.7} = 4.28... \] Thus the number of steps taken is rounded up to \(N^{*}_i = \left\lceil N_i \right\rceil = \left\lceil 4.28... \right\rceil = 5\) and the corrected time-step then becomes \[ \Delta t^{*}_{i} = \dfrac{3}{N^{*}_{i}} = \dfrac{3}{5} = 0.6 \]

Note: The exception to this rule is when the remainder is less than \(\epsilon = 10^{-3}\) i.e. if for instance \(N_i = 4.0001\) then the time-step is accepted, and the number of steps rounded down to \(N^{*}_{i}= \left\lfloor N_i \right\rfloor = 4\).

Argument: loss and loss_c


The following losses are currently available:

  1. loss='quadratic' (default)

  2. loss='huber'

  3. loss='tukey'

This argument only affects the Kalman filtering methods, and is used to regularize the likelihood contributions, removing the influence of large outliers.

The ith likelihood contributions is given by: \[ -\log L_{i}(\theta) \propto f(r_i) \] where \(r_i = \sqrt{e_{i}^{T} \Sigma_{i}^{-1} e_{i}}\) is a normalized residual, and where \(e_{i}\) is the ith residual vector, and \(\Sigma_{i}^{-1}\) the ith residual precision matrix.

The loss argument changes the function \(f\) as follows:

  1. If loss='quadratic' then \(f\) is quadratic in the residuals i.e. \[ f(r) = r^2 \] and the likelihood contributions are exactly those from a Gaussian.

  2. If loss='huber' then \(f\) is the Huber’s \(\psi\) function given by \[ \psi_{c}(r) = \left\{ \begin{array}{l} r^2 & \text{for} \,\, r \leq c \\ c(2r-c) & \text{otherwise} \end{array} \right\} \] which is quadratic/linear in the residuals below/above the threshold determined by \(c\).

  3. If loss='tukey' then \(f\) is Tukey’s byweight function given by \[ l_{c}(r) = \left\{ \begin{array}{l} r^2 & \text{for} \,\, r \leq c \\ c^2 & \text{otherwise} \end{array} \right\} \] which is quadratic/constant in the residuals below/above the threshold determined by \(c\).

In practice a smooth approximation to both Huber and Tukey are implemented in practice using the construction \[ \tilde{\psi}_{c}(r) = r^2 (1-\sigma_{c}(r)) + c^2 \sigma_{c}(r) \] \[ \tilde{l}_{c}(r) = r^2 (1-\sigma_{c}(r)) + c(2r-c) \sigma_{c}(r) \] where \(\sigma(r)\) is the sigmoid function \[ \sigma_{c}(r) = \dfrac{1}{1+\exp(-5(r-c))} \] The plot below shows the actual and implemented loss functions (almost indistinguishable) for \(c=5\). The threshold value is marked by with dashed line for the line \(x = c\).

Loss Functions

Loss Functions

The loss_c argument is used to determine the value of \(c\). The default values are chosen based on the fact that under the assumed (multivariate) normal distribution of the residuals, the squared (normalized) residuals follows a \(\chi^{2}_{m}\) distribution with degrees of freedom equal to the number of elements of \(e_{i}\) i.e: \[ r_{i}^2 = e_{i}^{T} \Sigma_{i}^{-1} e_{i} \sim \chi^{2}_{m} \] It is therefore reasonable to choose the threshold level (value of \(c\)) which determines whether or not \(r_{i}\) is an outlier, as the level at which the null-hypothesis \[ H_{0}: r_{i}^2 \sim \chi^{2}_{m} \] is rejected for some critical value \(1 - \alpha\). Choosing the significance level \(\alpha = 0.05\) the appropriate \(c\) threshold value becomes

m <- 1
qchisq(0.95,df=m)
## [1] 3.841459

where \(m\) is the number of observation equations.

Note: The significance level will be higher than the chosen if there are missing observations in some indices of \(i\) for systems with multiple observation equations.

Argument: use_hessian


This argument is a boolean which determines whether or not the likelihood hessian constructed by automatic differentiation from TMB is used during the optimization procedure by providing it to the hessian argument of stats::nlminb. The default is use.hessian=FALSE. The argument has no effect if method=laplace due to restrictions in TMB/RTMB.

The effect of providing the hessian is typically that an optimum is found in fewer iterations, but the cost of computing the hessian is relatively large, so it is often faster to optimize using only the gradient.

Argument: laplace.residuals


This boolean controls whether or not model residuals are calculated with TMB::oneStepPredict when the Laplace filtering is used (method=laplace). This takes a considerable amount of time - typically much longer than the estimation itself.

Argument: unconstrainted.optim


This boolean allows for quick unconstrained estimation removing the parameter boundaries specified by setParameter. This may be useful sometimes, to quickly check whether estimation issues occur because of ill boundaries.

Argument: estimate.initial.state


This boolean determines whether or not the initial-time state distribution (mean and covariance) should be estimated or not. By default estimate.initial.state=FALSE and it is not estimated but simply taken as the values provided by setInitialState. When estimate.initial.state=TRUE the mean and covariance are estimated as the stationary solution of the moment differential equations using the input values are the initial time-point.

The stationary solution is obtained first by solving \[ \dfrac{d\mathrm{E}\left[x_{\infty}\right]}{dt} = f(\mathrm{E}\left[x_\infty\right]) = 0 \] for \(\mathrm{E}\left[x_\infty\right]\) using Newton’s method. This stationary mean is then used to solve \[ \dfrac{d\mathrm{V}\left[x_\infty\right]}{dt} = \dfrac{df}{dx}\bigg\vert_{(\mathrm{E}\left[x_\infty\right])}\mathrm{V}\left[x_\infty\right] + \mathrm{V}\left[x_\infty\right]\left[\dfrac{df}{dx}\bigg\vert_{(\mathrm{E}\left[x_\infty\right])}\right]^{T} + g(\mathrm{E}\left[x_\infty\right])g(\mathrm{E}\left[x_\infty\right])^{T} = 0 \] for the stationary covariance \(\mathrm{V}\left[x_\infty\right]\) by calling a linear solver on the vectorized system of equations (using kronecker products).

Argument: control


This argument is a list which controls various settings of the stats::nlminb optimizer. See the documentation on ?stats::nlminb for more information.

The default is list(trace=1,iter.max=1e5,eval.max=1e5) which prints the iteration steps, and increases the default number of iterations and function calls allowed before the optimization procedure terminates.

Note:: The user should remember that disabling tracing by passing control = list(trace=0) will remove the ìter.maxand eval.max arguments, so they should be provided as well if needed.

Argument: silent


This boolean argument controls whether or not various information messages are printed by ctsmTMB during model building, compilation and estimation.

Example

Insert Example