Wednesday, June 26, 2019

Bayesian VAR Prior Comparison

EViews 11 introduces a completely new Bayesian VAR engine that replaces one from previous versions of EViews. The new engine offers two new major priors; the Independent Normal-Wishart and the Giannone, Lenza and Primiceri, that compliment the previously implemented Minnesota/Litterman, Normal-Flat, Normal-Wishart and Sims-Zha priors. The new priors were enhanced with new options for forming the underlying covariance matrices that make up essential components of the prior.

The covariance matrices that form the prior specification are generally formed by specifying a matrix alongside a number of hyper-parameters which define any non-zero elements of the matrix. The hyper-parameters themselves are either selected by the researcher, or taken from an initial error covariance estimate. Sensitivity of the posterior distribution to the choice of hyper-parameter is a well researched topic, with practitioners often selecting many different hyper-parameter values to check their analysis does not change based solely on (an often arbitrary) choice of parameter. However, this sensitivity analysis is restricted to the parameters selected by the researcher, with often only passing thought given to those estimated by an initial covariance estimate.

Since EViews 11 offers a number of choices for estimating the initial covariance, we thought it would be interesting to perform a comparison of forecast accuracy both across prior types, and across choices of initial covariance estimate.

Table of Contents

  1. Prior Technical Details
  2. Estimating a Bayesian VAR in EViews
  3. Data and Models
  4. Results
  5. Conclusions

Prior Technical Details

We will not provide in-depth details of each prior type here, leaving such details to the EViews documentation and its references. However we will provide a summary with enough details to demonstrate how an initial covariance matrix influences each prior type. We will also, for sake of notational convenience, ignore exogenous variables and the constant from our discussion.

First we write the VAR as: $$y_t = \sum_{j=1}^p\Pi_jy_{t-j}+\epsilon_t$$ where
  • $y_t = (y_{1t},y_{2t}, ..., y_{Mt})'$ is an M vector of endogenous variables
  • $\Pi_j$ are $M\times M$ matrices of lag coefficients
  • $\epsilon_t$ is an $M$ vector of errors where we assume $\epsilon_t\sim N(0,\Sigma)$

If we define $x_t=(y_{t-1}', ..., y_{t-p})$ stack variables to form, for example, $Y = (y_1, ...., y_T)'$, and let $y=vec(Y')$, the multivariate normal assumption on $\epsilon_t$ gives us: $$(y\mid \beta)\sim N((X\otimes I_M)\beta, I_T\otimes \Sigma)$$ Bayesian estimation of VAR models then centers around the derivation of posterior distributions of $\beta$ and $\Sigma$ based upon the above multivariate distribution, and prior distributional assumptions on $\beta$ and $\Sigma$.

To demonstrate how each prior relies on an initial estimate of $\Sigma$, for the priors other than Litterman, we only need to consider the component of each prior relating to the distribution $\beta$, and in particular its covariance.
  1. Litterman/Minnesota Prior

    $$\beta \sim N\left(\undrln{\beta}{Mn}{2.25pt}{}, \undrln{V}{Mn}{2.25pt}{}\right)$$ $\undrln{V}{Mn}{2.25pt}{}$ is assumed to be a diagonal matrix. The diagonal elements corresponding to endogenous variables, $i,j$ at lag $l$ are specified by: $$\undrln{V}{Mn, i,j}{-4.5pt}{l} = \begin{cases} \left(\frac{\lambda_1}{l^{\lambda_3}}\right)^2 &\text{for } i = j\\ \left(\frac{\lambda_1 \lambda_2 \sigma_i}{l^{\lambda_3} \sigma_j}\right)^2 &\text{for } i \neq j \end{cases} $$ where $\lambda_1$, $\lambda_2$ and $\lambda_3$ are hyper-parameters chosen by the researcher, and $\sigma_i$ is the square root of the corresponding $(i,i)^{\text{th}}$ element of an initial estimate of $\Sigma$.

    The Litterman/Minnesota prior also assumes that $\Sigma$ is fixed, forming no prior on $\Sigma$, just using the initial estimate as given.

  2. Normal-Flat and Normal-Wishart

    $$\beta\mid\Sigma\sim N\left(\undrln{\beta}{N}{2.25pt}{}, \undrln{H}{N}{0pt}{}\otimes\Sigma\right)$$ where $\undrln{H}{N}{0pt}{} = c_3I_M$ and $c_3$ is a chosen hyper-parameter. As such, the Normal-Flat and Normal-Wishart priors do not rely on an initial estimate of the error covariance at all.

  3. Independent Normal-Wishart

    $$\beta\sim N\left(\undrln{\beta}{INW}{2.25pt}{}, \undrln{H}{INW}{0pt}{}\otimes\Sigma\right)$$ where, again, $\undrln{H}{INW}{0pt}{} = c_3I_M$ and $c_3$ is a chosen hyper-parameter. Thus, like the Normal-Flat and Normal-Wishart priors the prior matrices do not depend upon an initial $\Sigma$ estimate. However, the Independent Normal-Wishart requires an MCMC chain to derive the posterior distributions, and the MCMC chain does require an initial estimate for $\Sigma$ to start the chain (although, hopefully, the impact of this starting estimate should be minimal).

  4. Sims-Zha

    $$\beta\mid\beta_0\sim N\left(\undrln{\beta}{SZ}{2.25pt}{}, \undrln{H}{SZ}{0pt}{}\otimes\Sigma\right)$$ $\undrln{H}{SZ}{0pt}{}$ is assumed to be a diagonal matrix. The diagonal elements corresponding to endogenous variables, $i,j$ at lag $l$ are specified by: $$\undrln{H}{SZ, i, j}{-4.5pt}{l} = \left(\frac{\lambda_0\lambda_1}{\sigma_j l^{\lambda_3}}\right)^2 \text{for } i = j$$ where $\lambda_0$, $\lambda_1$ and $\lambda_3$ are hyper-parameters chosen by the researcher, and $\sigma_i$ is the square root of the corresponding $(i,i)^{\text{th}}$ element of an initial estimate of $\Sigma$.

  5. Giannone, Lenza and Primiceri

    $$\beta\mid\beta_0\sim N(\undrln{\beta}{GLP}{2.25pt}{}, \undrln{H}{GLP}{0pt}{}\otimes\Sigma)$$ $\undrln{H}{GLP}{0pt}{}$ is assumed to be a diagonal matrix. The diagonal elements corresponding to endogenous variables, $i,j$ at lag $l$ are specified by: $$\undrln{H}{GLP,i,j}{-4.5pt}{l} = \left(\frac{\lambda_1}{\phi_j l^{\lambda_3}}\right)^2 \text{for } i = j$$ where $\lambda_1$, $\lambda_3$ and $\phi_j$ are hyper-parameters of the prior.

    GLP's method revolves around using optimization techniques to select the optimal hyper-parameter values. However, it is possible to optimize only a subset of the hyper-parameters and select others. $\phi_j$ is often set, rather than optimized, as $\phi_j = \sigma_j$ and is the square root of the corresponding $(j,j)^{\text{th}}$ element of an initial estimate of $\Sigma$. Even when $\phi_j$ is optimized rather than set, an inititial estimate is used as the starting point of the optimizer.

Of these priors, only the normal-flat and normal-Wishart priors do not rely on an initial estimate of $\Sigma$ at all. Consequently the method used for that initial estimate might have a large impact on the final results.

Different implementations of Bayesian VAR estimations use different methods to calculate the initial $\Sigma$. Some of these methods are:

  • A classical VAR model.
  • A classical VAR model with the off-diagonal elements replaced with zero.
  • A univariate AR(p) model for each endogenous variable (forcing $\Sigma$ to be diagonal).
  • A univariate AR(1) model for each endogenous variable (forcing $\Sigma$ to be diagonal).

With each of these methods, there is also the decision as to whether to degree-of-freedom adjust the final estimate (and if so, by what factor), and whether to include any exogenous variables from the Bayesian VAR in the calculation of the classical VAR or univariate AR models.

Bayesian VAR priors can be complimented with the addition of dummy-observation priors to increase the predictive power of the model. There are two specific priors - the sum-of-coefficients prior that adds additional observations to the start of the data to account for any unit root issues, and the dummy-initial-observation prior which adds additional observations to account for cointegration.

With the addition of extra observations to the data used in the Bayesian prior, there is also a choice to be made as whether those additional observations are also included in any initial covariance estimation.

Estimating a Bayesian VAR in EViews

Estimating VARs in EViews is straight forward, you simply select the variables you want in your VAR, right click, select Open As VAR and then fill in the details of the VAR, including the estimation sample and the number of lags. For Bayesian VARs the only additional steps that need to be taken are changing the VAR type to Bayesian, and then filling in the details of the prior you want to use and any hyper-parameter specification.

For full details on how to estimate a Bayesian VAR in EViews, refer to the documentation, and examples.

However we’ve also provided a simple video demonstration of both importing the data used in this blog post, and estimating and forecasting the normal-Wishart prior.

Data and Models

To evaluate the forecasting performance of the priors under different initial covariance estimation methods, we'll perform an experiment closely following that performed in Giannone, Lenza and Primiceri (GLP). Notably, we use the Stock and Watson (2008) data set which includes data on 149 quarterly US macroeconomic variables between 1959Q1 and 2008Q4.

Following GLP we produce forecasts from the BVARs recursively for two forecast lengths (1 quarter and 1 year), starting with data from 1959 to 1974, then increasing the estimation sample by one quarter at a time, to give 128 different estimations.

We perform two sets of experiments, each representing a different sized VAR:

  • SMALL containing just three variables - GDP, the GDP deflator and the federal funds rate.
  • MEDIUM containing seven variables - adding consumption, investment, hours and wages.

Each of these VARs is estimated at five lags using a classical VAR and 39 different combinations of prior and initial covariance options:

After each BVAR estimation, Bayesian sampling of the forecast period is performed - drawing from the full posterior distributions for the Litterman, Normal-flat, Normal-Wishart and Sims-Zha priors, and running MCMC draws for the Independent normal-Wishart and GLP priors. The mean of the draws is used as a point estimate, and the root mean square error (RMSE) is calculated. Each forecast draw uses 100,000 iterations. With 39*128=4,992 forecasts and two sizes of VARs, that is a total of 1 billion draws!


The following tables show the average root-mean square of each of the four sets of forecasts. Click on a table to enlarge the image.


For the three variable one-quarter ahead experiment, it is clear that the GLP prior is more effective than the other prior types, although the Litterman prior is relatively close in accuracy. In terms of which covariance method performs best, there is no clear winner, with the differences between covariance choice only having a large impact on the Litterman and GLP priors.

The choice of whether to include dummy observation priors, and if so whether to include them in the covariance calculation, choice appears to only impact the GLP prior severely.

The overall winner, at least in terms of RMSE, was the GLP prior with a diagonal VAR used for initial covariance choice without dummy observations.

A similar story is told for the three variable one-year ahead experiment, however this time the Litterman prior is the clear winner. Again there is not much difference between covariance choices and dummy observation choices. Notably, although Litterman does best across the options, the overall most accurate was the Normal-flat.

Expanding to the five variable VARs, the one-quarter ahead experiment is not as clear-cut as the three variable equivalent. Across covariance options is a toss-up between Litterman and GLP. The effect of covariance has a bigger impact, with the Univariate AR(5) option looking best.

For the first time, optimizing $\phi$ in the GLP prior has a positive impact, with the version including dummy observations being the overall most accurate option combination.

The final experiment is similar, no clear-cut winner in terms of prior choice, although Litterman might just edge GLP. Choice of covariance again has an impact, with again a univariate AR(5) looking best.

Across all the experiments it is difficult to give an overall winner. The original Litterman and GLP priors are ahead of the others, but knowing which covariance choice to select or whether to include dummy observations is more ambiguous.

One absolutely clear result is, however, that no matter which combination of prior and options are selected, the Bayesian VAR will vastly outperform a classical VAR.

Finally, it is worth mentioning that these results are, with the obvious exception of the GLP prior, for a fixed set of hyper-parameters, and the conclusions may differ if attention is given to simultaneously finding the best set of hyper-parameters and covariance choice.

No comments:

Post a Comment