Wednesday, May 30, 2018

State Space Models with Fat-Tailed Errors and the sspacetdist add-in

Author and guest post by Eren Ocakverdi.


Linear State Space Models (LSSM) provide a very useful framework for the analysis of a wide range of time series problems. For instance; linear regression, trend-cycle decomposition, smoothing, ARIMA, can all be handled practically and dynamically within this flexible system.
One of the assumptions behind LSSM is that the errors of the measurement/signal equation are normally distributed. In practice, however, there are situations where this may not be the case and errors follow a fat-tailed distribution. Ignoring this fact may result in wider confidence intervals for the estimated parameters or may cause outliers to bias parameter estimates.

Treatments for heavy-tailed distributions covered in detail in Durbin and Koopman (2012), where they use mode estimates. The following is a signal plus noise model:
$$y_t = \omega_t + \epsilon_t$$
Here, $\omega_t$ is linear Gaussian, and $\epsilon_t$ follows a Student's t-distribution. Observation variance is then given by:
$$A_t = \frac{(v-2)\sigma_\epsilon^2 + \tilde{\epsilon_t^2}}{(v+1)}$$
The Kalman filter and smoother can be applied iteratively to obtain a new smooth estimate of $θ_t$. New values for the signal estimates $\tilde{\epsilon_t}$ are used to compute new values for $A_t$ until convergence to $\epsilon_t$.
This iterative procedure is not built in to EViews, but there is no an add-in, sspacetdist, that allows it. The add-in implements Mean Absolute Percentage Error (MAPE) as the preferred performance metric for convergence.
As an example, Durbin and Koopman (2012) analyze the logged quartely demand for gas in the UK from 1960 to 1986 (gas_data.wf1). They use a structural time series model of the basic form:
$$y_t = \mu_t + \gamma_t + \epsilon_t$$ Here, $\mu_t$ is the local linear trend, $\gamma_t$ is the seasonal component and $\epsilon_t$ is the observation disturbance. We can use the SSpace object of EViews to build this framework and then estimate the model via sspacetdist add-in (sspacet_example1.prg).
The example program file will also generate the Fig. 14.4 on page 318 of Durbin and Koopman (2012). Upper left and right panels are the estimated seasonal components from Gaussian and Student’s t model, respectively. Lower left and right panels are the estimated irregular components of these models, respectively.


Please note that this is an approximating model, but can still be very useful in practice. As another example, let’s simulate a two independent variables regression model with t-distributed errors:
$$y_t = 0.6*x_{1t} + 0.3*x_{2t} + \epsilon_t\text{, where } \epsilon_t \sim t(v=3)$$
Next we estimate the parameters with both maximum likelihood and this iterative state space scheme (sspacet_example2.prg).



Maximum likelihood estimation can be specified within a LogL object. Estimated parameters are close to their theoretical (simulated) values as they all lie within the associated confidence interval.

In order to see how approximating state space model performs, parameters are estimated via add-in:



Note that state space model must be estimated in Gaussian form first. Smoothed state values correspond to coefficients of independent variables and they are very close to the ones estimated by maximum likelihood, which is the true approach for this problem.

As for the degrees-of-freedom parameter, a separate distribution fitting exercise on smoothed disturbances is required. Again, two values are very close (both can be rounded to 3.32).

Note: Interested reader can estimate these models assuming errors are normally distributed and see how confidence intervals of parameters change.



Reference:
Durbin, J. and Koopman, S. J., (2001). Time Series Analysis by State Space Methods, 2nd ed., Oxford University Press.

1 comment: