tag:blogger.com,1999:blog-68832474046785494892023-12-07T10:51:15.660-08:00EViewsIHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.comBlogger63125tag:blogger.com,1999:blog-6883247404678549489.post-43206253901672197542023-11-29T09:38:00.000-08:002023-11-29T09:38:04.706-08:00From Bańbura et al. (2010) to Cascaldi-Garcia’s (2022) Pandemic Priors<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.classic_table {
border: 1px solid black;
border-collapse: collapse;
border-spacing: 0px;
}
.classic_table tr {
border-bottom: 1px solid black;
border-top: 1px solid black;
}
.classic_table tr:first-child {
border-top: none;
}
.classic_table tr:last-child {
border-bottom: none;
}
.classic_table td {
border-left: 1px solid black;
border-right: 1px solid black;
padding-right: 10px;
padding-left: 10px;
}
.classic_table td:first-child {
border-left: none;
}
.classic_table td:last-child {
border-right: none;
}
.break_row {
border-bottom: 3px solid #fa5e5e !important
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.wfvar {
font-weight: bold;
text-transform: uppercase;
}
.wf {
font-weight: bold;
text-transform: uppercase;
}
.subseccol {
color: #fa5e5e
}
.bold {
font-weight: 400;
}
.col_blue {
color: rgba(41, 61, 92, 1)
}
.col_red {
color: rgba(250, 94, 94, 1)
}
.col_green {
color: rgba(0, 200, 125, 1)
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["HTML.js", "AMSmath.js"],
Macros: {
bm: ["{\\boldsymbol #1}",1],
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Ole Rummel and Davaajargal Luvsannyam</i></br></br>
This is the second in a series of blog posts that will present EViews add-in, LBVAR, aimed at estimating and forecasting a large Bayesian VAR model due to Banbura, Giannone and Reichlin (2010). We will discuss and replicate Cascaldi-Garcia (2022) on this blog.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Why we should be using Cascaldi-Garcia’s (2022) Pandemic Priors</a>
<li><a href="#sec3">Implementing Cascaldi-Garcia’s (2022) Pandemic Priors in EViews</a>
<li><a href="#sec4">Using the lbvar EViews add-in for Pandemic Priors</a>
<li><a href="#sec5">Concluding Remarks</a>
<!-- <li><a href="#sec6">Files</a> -->
<li><a href="#sec7">References</a>
</ol><br />
<h1 class="seccol", id="sec1">Introduction</h1>
Cascaldi-Garcia (2022) proposes an easy and straightforward solution to deal with the extreme COVID-19 episode in Bayesian VAR (BVAR) models, which have become the workhorse models in many central banks. More specifically, he illustrates how to augment the dummy observations employed in the Minnesota or Litterman prior with time dummies. These Pandemic Priors will be time dummies with uninformative priors, which are able to correctly adjust the historical relationships among the variables for the extreme values observed in specific sample periods. While designed for the COVID-19 pandemic, this approach provides an easy and straightforward solution to deal with any extreme episode, recover historical relationships and allows for the proper identification and propagation of structural shocks.</br></br>
Following the notation we have throughout this exercise, assume a VAR model with n variables and p lags:
\begin{align}
Y_{t} = c + \bm{1}_{t=a} d_{a} + \bm{1}_{t=a+1} d_{a+1} + \ldots + \bm{1}_{t=a + h} d_{a + h} + A_{1}Y_{t - 1} + \ldots + A_{p} Y_{t - p} + u_t
\end{align}
where $ u_{t} $ are the innovations with $ E [u_{t} u_{t}^{\top}] = \Sigma $, $ c $ is a vector of $ n $ intercepts, $ d_{a} $ through $ d_{a + h} $ are $ h- $ vectors with $ n $ time dummies, for a pre-defined number of $ h $ periods from $ a $ though $ a + h $ which can be the COVID-19 period, and $ \bm{1}_{t = i} $ is an indicator function that equals unity for period set $ i = a, a + 1, \ldots, a + h $, and zero otherwise.</br></br>
What do these dummies look like? Our starting point is the possibility that each variable in our model can potentially experience a different shift and persistence during the COVID-19 period, but that the individual time dummies are able to capture these heterogenous responses. Jumping ahead a bit, the empirical illustration below will be using six pandemic dummy variables (<code>dum1, dum2, dum3, dum4, dum5 and dum6</code>), which are shown below in Figure 1 for some of the sample period. We can see that <code>dum1</code> takes on a value of 1 in <code>2020m3</code>, i.e., March 2020, which is generally taken as the "official" start of the COVID-19 pandemic. The value of 1 then moves to April 2020, which is captured by <code>dum2</code>. We continue to move through the remaining months of 2020 Q2 and the first month of 2020 Q3: <code>dum3</code> is set to 1 in May 2020, <code>dum4</code> is set to 1 in June 2020, <code>dum5</code> is set to 1 in July 2020 and <code>dum6</code> is set to 1 in August 2020.</br></br>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image1.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image1.png" title="Image 1" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>Figure 1: Pandemic dummies in Cascaldi-Garcia’s (2022) empirical illustration</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
We generate the dummies in EViews as follows. First, we create a new variable consisting of zeroes called <code>dum1</code> by clicking on <b>Genr</b> on top of the workfile window. This opens a dialog box to <b>Generate Series by Equation</b>. We subsequently enter <code>dum1 = 0</code> into the <b>Enter equation</b> window.</br></br>
We then access the new variable <code>dum1</code> by double-clicking, which opens it. In the next step, we click on <b>Enter+/–</b> which unlocks the spreadsheet. This allows us to make changes, more specifically, we select the entry for March 2020. At the moment, the value for that date is 0. We manually change this to 1 and press <b>Enter+/–</b> again to lock the spreadsheet. This concludes the creation of a dummy for March 2020. We then repeat this operation five times to create five additional dummies for April 2020 (<code>dum2</code>), May 2020 (<code>dum3</code>), June 2020 (<code>dum4</code>), July 2020 (<code>dum5</code>) and August 2020 (<code>dum6</code>).</br></br>
As in Litterman (1986) and BGR, Cascaldi-Garcia (2022) imposes the prior that the variables are centred around the random walk with drift, but now extending the concept to the idea that the COVID-19 pandemic is an abnormal period where the relationship between the variables may diverge from history.</br></br>
In other words, the Pandemic Priors can be represented as:</br></br>
\begin{align}
Y_{t} = c + \bm{1}_{t=a} d_{a} + \bm{1}_{t=a+1} d_{a+1} + \ldots + \bm{1}_{t=a + h} d_{a + h} + Y_{t - 1} + u_t
\end{align}
which is equivalent to shrinking the coefficient matrix $ A_{1} $ to $ I_{n} $ and the matrices $ A_{2}, \ldots, A_{p} $ to zero matrices.</br></br>
The moments for the prior distribution of the coefficients are set as:
\begin{align}
E \left[ \left( A_{l} \right)_{i,j} \right] &=
\begin{cases}
\rho_{i} \quad (j = 1, i = 1)\\
0 \quad \text{otherwise}
\end{cases}\\
Var \left[ \left( A_{l} \right)_{i,j} \right] &=
\begin{cases}
\frac{\lambda_{1}^{2}}{p^{2}} \quad (i = j)\\
\lambda_{2} \frac{\lambda_{1}^{2}\sigma_{i}^{2}}{l^{2}\sigma_{j}^{2}} \quad \text{otherwise}
\end{cases}
\end{align}
The coefficients in $ A_{1}, \ldots , A_{p} $ are assumed to be independent and normally distributed, the covariance matrix of the residuals is assumed to be diagonal, such that $ \Sigma = diag(\sigma_{1}^{2}, \ldots , \sigma_{n}^{2}) $, and the prior on the intercept is diffuse. The same diffuse prior is taken for the time dummies. The choices for $ \sigma_{i} $; the overall prior tightness, $ \lambda_{1} $; the factor $ \frac{1}{l^{2}} $ and the coefficient $ \lambda_{2} $ are following the standard practice described in BGR. In addition, the choices are flexible enough to accommodate beliefs about persistence, shrinkage toward the prior, variance decrease over lags and the importance of own lag, $ i $.</br></br>
By setting $ \lambda_{2} = 1$, it is possible to impose a normal-inverse Wishart prior of the form:
\begin{align}
vec\left( B \right ) | \Psi &\sim N\left( vec\left( B_{0} \right), \Sigma_{u} \otimes \Omega_{0} \right) \\
\Psi &\sim IW \left( S_{0}, \alpha_{0} \right)
\end{align}
where $ B $ is the matrix that collects the reduced-form coefficients of the $ Y = XB + U $ VAR system, $ B_{0}, \Omega_{0}, S_{0} $ and $ \alpha_{0} $ are prior expectations and $ E \left[ \Psi \right] = \Sigma $.</br></br>
In practice, these priors can be easily implemented through a series of dummy observations. Cascaldi-Garcia (2022) extends the BGR approach to allow for priors for the $ h $ time dummies described in equation (94). Formally, the left- and right-hand side dummy observations ($ Y_{d} $ and $ X_{d} $ respectively) are defined as:
\begin{align}
Y_{d} &=
\begin{bmatrix}
\frac{\text{diag}\left(\rho_{1} \sigma_{1} \ldots, \rho_{n} \sigma_{n} \right)}{\lambda} \\
\bm{0}_{n(p - 1) \times n} \\
\frac{\text{diag}\left(\rho_{1} \mu_{1} \ldots, \rho_{n} \mu_{n} \right)}{\tau} \\
\text{diag}\left(\sigma_{1} \ldots, \sigma_{n} \right) \\
\bm{0}_{1 \times n}
\end{bmatrix} \\
X_{d} &=
\begin{bmatrix}
J_{p} \otimes \frac{\text{diag}\left(\rho_{1} \sigma_{1} \ldots, \rho_{n} \sigma_{n} \right)}{\lambda} & \bm{0}_{np \times 1} & \bm{0}_{np \times h} \\
\bm{1}_{1 \times p} \otimes \frac{\text{diag}\left(\sigma_{1} \ldots, \sigma_{n} \right)}{\tau} & \bm{0}_{n \times 1} & \bm{0}_{n \times h} \\
\bm{0}_{n \times np} & \bm{0}_{n \times 1} & \bm{0}_{n \times h} \\
\bm{0}_{1 \times np} & \epsilon & \phi I_{1 \times h} \\
\end{bmatrix}
\end{align}
where $ J_{p} = \text{diag}\left(1, 2, \ldots , p\right) $, and $ \epsilon $ imposes an uninformative prior on the intercept. As before, the first block of dummies (at the top of the respective matrices) imposes prior beliefs on the autoregressive coefficients, the second block constrains the sum of coefficients, the third block implements the prior for the variance-covariance matrix and the fourth block of dummies (at the bottom of the respective matrices) reflects the uninformative prior for the intercept.</br></br>
Comparing equations (4) of the first blog and (6), however, the innovation due to Cascaldi-Garcia (2022) occurs in the last column of the $ X_{d} $ matrix on the right-hand side of the equation, which imposes priors also for the time dummies through $ \phi $, which is ordered last in $ X_{d} $. Following common practice, $ \sigma_{i} $ can be calibrated from the variances of the residuals of univariate AR models with $ p $ lags for each of the $ n $ variables in the information set. Setting $ \epsilon $ to a very small number makes the prior on the intercept fairly uninformative, and the same uninformative approach is followed for $ \phi $. In short, the final matrices for $ Y_{d} $ and $ X_{d} $ retain the five rows from equation $ (X) $, but the $ X_{d} $ matrix now has an additional column.</br></br></br></br>
<h1 class="seccol", id="sec2">Why we should be using Cascaldi-Garcia’s (2022) Pandemic Priors</h1>
Lenza and Primiceri (2022) methodology for estimating a VAR after March 2020 conjectures that the shocks observed at the onset of the COVID-19 pandemic translate into substantially larger volatility in the underlying macroeconomic and financial time series. More specifically, if the volatility of all shocks were scaled up by exactly the same amount, with exactly the same persistence thereafter (which is referred to as the commonality assumption), it is possible to establish priors and estimate these parameters. As noted by the authors themselves, the communality assumption is an approximation that works well in a period in which all series experience the same (excessive) variation.</br></br>
Statistical and empirical approaches to dealing with structural breaks in estimation have gone through two phases. In the first phase (Phase 1), econometricians as well as applied economists spent a lot of time monitoring and identifying structural breaks in the data and adjusting for them by using robust estimation methods. In most cases, it was possible – and even advisable – to model the structural break itself.</br></br>
In the approaches of the second – arguably still ongoing – phase (Phase 2), we generally take the existence of a structural break in the data as given and, rather than model the break itself, we seek robust estimation/forecasting methods that are less susceptible to the effects of the structural break in the data on estimation. Which of the two approaches would be most suitable for the structural break induced by the COVID-19 pandemic?</br></br>
Dealing with structural breaks is somewhat easier if the location of the structural break is known. This turns the analysis into one of modelling a structural break rather than first estimating and then modelling the break, which compounds possible errors on top of each other. We note that this refers to the first generation of structural break models.</br></br>
In particular, if the structural break is of limited duration, we can try and capture the aberrant observations with the help of dummy variables. This is nothing else than the standard approach of modelling outliers in the data – equal to the pandemic period – with the help of dummy variables.</br></br>
Predicting the macroeconomic impact of the COVID-19 pandemic with reduced-form time-series models is challenging because a shock of this scale was never directly observed in the available data. More specifically, the pandemic caused macroeconomic variables to display complex patterns that do not follow any historical behaviour. Moreover, the COVID-19 shock falls between two stools:</br></br>
<ul>
<li>we know it happened (no need for break identification); but</li>
<li>it is too big to ignore</li>
</br>
</ul>
In short, the COVID-19 pandemic is a structural break that really cannot be ignored. But earlier approaches, such as break monitoring with robust modelling and data-dependent downweighting of historical data, may fall short in dealing with the effects of the COVID-19 pandemic shock, so where do we go from here? What is needed is an empirical approach that can deal with such unusual behaviour and retain historical relationships, generate reliable forecasts and provide correct interpretations of economic shocks.</br></br>
The unprecedented nature of the COVID-19 pandemic and its impact on macroeconomic variables has led to several very clever new ideas and techniques for modelling and forecasting in the presence of structural breaks. Academics and researchers such as Lenza and Primiceri (2022), Primiceri and Tambalotti (2020) and Cascaldi-Garcia (2022) have returned to modelling the break process explicitly. This does not mean that the older robust approaches of Phase 2 no longer work, but we should take advantage of the fact that we know exactly when the break occurred – this is a luxury we do not often have. All three approaches are distinguished by their use of vector autoregressions (VARs) estimated using Bayesian techniques. Alternative approaches to dealing with the COVID-19 induced break were proposed by Schorfheide and Song (2021), who suggested discarding the extreme observations, and the complex settings in Carriero et al. (2022), involving modelling extreme observations as random shocks in the stochastic volatility of the VAR model.</br></br>
As we have seen in the presentation, the structural breaks induced by the COVID-19 pandemic invalidate historical relationships in the data, produce unreliable forecasts and lead to incorrect interpretations of structural (or primitive) economic shocks. They do this on account of generating intercept shifts in the macroeconomic variables in the model in the selected periods. This is very much in line with the V-shaped recovery, which seems to have been the case in many economies after the COVID-19 period in 2020 H1.</br></br>
Cascaldi-Garcia (2022) considers eight monthly US macroeconomic and financial variables: the excess bond premium of Gilchrist and Zakrajšek (2012), Standard and Poor’s (S&P) 500 stock market index, the Wu and Xia (2016) federal funds shadow rate, real personal consumption expenditure, the personal consumption expenditure price index, total non-farm payrolls, real industrial production and the number of unemployed as a percentage of the labour force.</br></br></br></br>
<h1 class="seccol", id="sec3">Implementing Cascaldi-Garcia’s (2022) Pandemic Priors in EViews</h1>
We note some differences between Cascaldi-Garcia’s (2022) approach and that put forward by Lenza and Primiceri (2002). In addition to estimating the VAR by modelling a common shift and persistence of the volatility of the shocks during the extreme period of the COVID-19 pandemic, the latter also assume that the volatility of the shocks is scaled up (and decays) by exactly the same amount. This allows them to formulate a prior and establish these scale parameters.</br></br>
By allowing for direct individual intercept shifts during the pandemic period rather than common volatility scale shifters and persistence, Cascaldi-Garcia’s (2022) approach is much simpler. In fact, under the Pandemic Priors, the individual time dummies will capture each variables’ different shifts and persistence.</br></br>
As mentioned above, the Pandemic Priors build upon BGR by extending the dummy observation approach to encompass time dummies during the extreme period.</br></br>
<h3>Data</h3>
The <a href="https://drive.google.com/file/d/1c5foZv3g2nzgiN-yegV0dejCSJwHRyfu/view">data</a> we are using are taken from Cascaldi-Garcia’s <a href="https://sites.google.com/site/cascaldigarcia/research">website</a>. The EViews workfile <b class="wf">data_pprior.wf1</b> contains the data that will be used in the estimation. To open the EViews workfile from within EViews, choose <b>File, Open, EViews Workfile…</b>, select <b>data_pprior.wf1</b> from the appropriate Data folder and click on <b>Open</b>. Alternatively, you can double-click on the workfile icon from outside of EViews, which will open the workfile in EViews automatically.</br></br>
The eight monthly US macroeconomic and financial variables are all in levels and consist of the excess bond premium (<code>ebp</code>) of Gilchrist and Zakrajšek (2012), the log of Standard & Poor’s (S&P) 500 stock market index (<code>sp500l</code>), the Wu and Xia (2016) federal funds shadow rate (<code>fedfunds</code>), the log of real personal consumption expenditure (<code>pcel</code>), the log of the personal consumption expenditure price index (<code>pcepil</code>), the log of total non-farm payrolls (<code>payemsl</code>), the log of real industrial production (<code>indprol</code>) and the number of unemployed as a percentage of the labour force (<code>unrate</code>). Note that Cascaldi-Garcia’s (2022) takes logs of five of the eight variables (<code>sp500, pce, pcepi, payems and indpro</code>) – these variables are indicated by an "l" at the end of their name. In essence, these eight variables capture a modern-day monetary model in the spirit of Christiano et al. (1999). The sample period runs from January 1975 to March 2022, for a total of 567 monthly observations.</br></br>
Whenever we begin working with a new data set, it is always a good idea to take some time to simply examine the data, so the first thing we will do is to plot the data to make sure that it looks fine. This will help ensure that there were no mistakes in the data itself or in the process of reading in the data. It also provides us with a chance to observe the general (time-series) behaviour of the series we will be working with. A plot of our data is shown in Figure 2.</br></br>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image2.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image2.png" title="Image 2" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>Figure 2: Time series of the eight underlying monthly US macroeconomic and financial time series (January 1975 – March 2022)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
The excess bond premium (ebp) looks pretty stationary, and the log of the PCE price index (<code>pcepil</code>) does not show any dislocation during the pandemic period. The same is true, to some extent, for the log of the S&P 500 index (<code>sp500l</code>) and the shadow rate (<code>fedfunds</code>). On the other hand, the log of private consumption expenditure (<code>pcel</code>) shows a notable dip during the onset of the COVID-19 pandemic, as does the log of industrial production (<code>indprol</code>). Most obvious is the sharp spike in the unemployment rate (<code>unrate</code>), which is mirrored by the sharp fall in the log of employment (<code>payemsl</code>).</br></br>
In short, our (subjective) visual inspection reveals evidence of structural breaks in four of the eight variables. But when should the COVID-19 period end? These aberrant observations are only in the data for a few months (some four months of double-digit entries in the case of <code>unrate</code>, more or less four months below 140,000 in the case of <code>payemsl</code>, roughly five months in the case of <code>pcel</code> and about seven months below 4.57 in the case of <code>indprol</code>), and yet their influence on the VAR is sizeable, as we will see below in a BVAR model that does not account for the structural break.</br></br></br></br>
<h1 class="seccol", id="sec4">Using the lbvar EViews add-in for Pandemic Priors</h1>
Cascaldi-Garcia (2022) produces two graphical pieces (figure 4 and 5) of evidence that highlight the pitfalls of not accounting for the pandemic period and the benefits of using the Pandemic Priors. Our task at hand will be to replicate both of these figures 4 and 5 using EViews’ (updated) <b>lbvar</b> add-in for the estimation of (very) large Bayesian VARs as described by BGR.</br></br>
In order to replicate these figures of Cascaldi-Garcia (2022), we use EViews’ <b>lbvar</b> add-in, which can be run either interactively via a dialog box that is available to us after we open the add-in from the <b>Add-ins</b> option at the top of the screen after we open EViews or using a few lines of EViews code. Personally, we have found working with a short EViews program to communicate the settings to the <b>lbvar</b> add-in more user-friendly than the dialog window, although they obviously both serve the same purpose. We will have a look at both approaches in what follows.</br></br>
One thing we must do before estimation is to specify the prior means of our endogenous variables, which are all set to one. With EViews and the <b class="wf">data_pprior.wf1</b> workfile open, this is most easily accomplished by typing the following command into the command line:</br></br>
<center>
<code>
vector irw_s = @ones(8)
</code>
</br></br>
</center>
This is a shortcut, but there is – as always with EViews – an equivalent way using the dialog menus. The alternative way of creating the (8 × 1) vector of ones called <code>irw_s</code> is to go to <b>Object</b> in the top bar of the workfile window and select <b>New Object…, Matrix-Vector-Coef</b> and giving it the name of <code>irw_s</code>.</br></br>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image3.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image3.png" title="Image 3" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>Figure 3: New Matrix Object</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
As we have eight endogenous variables, we must assign a prior mean to each, which is why we create an (8 × 1) vector called <code>irw_s</code>. At the moment, the eight values are all zero, but we need to change them to ones. We click on Edit+/– to ‘unlock’ the spreadsheet and change all the entries from 0 to 1. We then click <b>Edit+/–</b> again to lock the spreadsheet and close the vector object.</br></br>
We are now ready to use EViews’ <b>lbvar</b> add-in. The documentation that comes with the add-in presents a list of possible (text) commands that need to be specified. All these options are also reflected in the dialog box that we will look at shortly. The optional settings are almost equivalent between the BGR approach and the Pandemic Priors, except for the fact that the Pandemic Priors have an additional four. These can be found at the bottom of Table 1. The options are included either via the boxes, drop-down menus and checkboxes in the dialog associated with the <b>lbvar</b> add-in or the EViews command language which we will get to shortly. We note that not all the possible options need to be included in every single application. Table 1 below shows all the available options, the respective commands, default settings (in parentheses) and a short description of what they do.</br></br>
<center>
<table class='classic_table'>
<thead></thead>
<tbody>
<tr style="background-color: #fa5e5e">
<td><b>Object Name</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td><code>lambda</code></td>
<td>Prior parameter lambda (default setting is $ \lambda = 0.1 $)</td>
</tr>
<tr>
<td><code>sum</code></td>
<td>Include sum-of-coefficients dummy observation prior (default setting is <code>sum = 1</code>, i.e., prior is switched on)</td>
</tr>
<tr>
<td><code>tau</code></td>
<td>Prior parameter tau (default setting is $ \tau = 10 $ and $ \lambda = 1 $)</td>
</tr>
<tr>
<td><code>estimate</code></td>
<td>
Estimation:
<ol>
<li>impulse response functions (IRFs, default setting) (<code>estimate = 1</code>)</li>
<li>forecasting (<code>estimate = 2</code>)</li>
</ol>
</td>
</tr>
<tr>
<td><code>horizon</code></td>
<td>Number of horizons for IRFs (default setting is 48)</td>
</tr>
<tr>
<td><code>mcdraw</code></td>
<td>Number of Monte Carlo draws (default setting is 100)</td>
</tr>
<tr>
<td><code>cband</code></td>
<td>Overall percentage of confidence band (fraction less than 1, default setting is 0.68 or one standard deviation))</td>
</tr>
<tr>
<td><code>grid</code></td>
<td>Grid search for optimal lambda and tau</td>
</tr>
<tr>
<td><code>fit</code></td>
<td>Fit evaluation variables</td>
</tr>
<tr>
<td><code>tsample</code></td>
<td>Training sample size (enter as <code>tsample = “first_period last_period”</code>)</td>
</tr>
<tr>
<td><code>suffix</code></td>
<td>Forecast output suffix (default is <code>_f</code>)</td>
</tr>
<tr>
<td><code>fhorizon</code></td>
<td>Forecast horizons (default is 12 periods)</td>
</tr>
<tr>
<td><code>sample</code></td>
<td>Sample size (default setting is the current workfile sample size)</td>
</tr>
<tr>
<td><code>vd</code></td>
<td>Variance decomposition (default setting is <code>vd = 0</code>, i.e., this option is switched off; option is switched on with <code>vd = 1</code>)</td>
</tr>
<tr>
<td><code>hd</code></td>
<td>Historical decomposition (default setting is <code>hd = 0</code>, i.e., this option is switched off; option is switched on with <code>hd = 1</code>)</td>
</tr>
<tr>
<td><code>save</code></td>
<td>Save IRFs to matrix (<code>save = matrix_name</code>)</td>
</tr>
<tr>
<td><code>ident</code></td>
<td>Identification of shocks (1 = Cholesky or recursive decomposition (default), 2 = Generalised decomposition)</td>
</tr>
<tr class='break_row'>
<td><code>pand</code></td>
<td>Include Pandemic Priors (<code>pand = 1</code>, Pandemic Priors switched on; <code>pand = 0</code>, Pandemic priors switched off)</td>
</tr>
<tr>
<td><code>covper</code></td>
<td>Number of COVID-19 periods (default is zero)</td>
</tr>
<tr>
<td><code>dummy</code></td>
<td>List of dummy variables</td>
</tr>
<tr>
<td><code>phi</code></td>
<td>$ \phi $ for dummy observations (default setting is $ \phi = 0.001 $)</td>
</tr>
<tr>
<td><code>eps</code></td>
<td>$ \epsilon $ prior parameter for intercept (default setting is $ \epsilon = 0.00001 $)</td>
</tr>
</tbody>
<tfoot>
<tr class='nb'>
<td colspan='2'>
</br>
Note: The first 17 options and their settings apply to BGR, while the additional bottom five options need to be activated for Cascaldi-Garcia’s (2022) Pandemic Priors.
</td>
</tr>
</tfoot>
</table>
<br />
</center>
The <b>lbvar</b> add-in is run by selecting it from the <b>Add-ins</b> menu in the Command bar, after which the following dialog window will appear.</br></br>
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image4.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image4.png" title="Image 4" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>Figure 4: lbvar Dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
The add-in has several prepopulated default settings, some of which may not be appropriate in all applications. In particular, these are the settings for:</br></br>
<ul>
<li>the prior parameter <code>lambda</code>, which regulates the importance given to the priors, is set to 0.1</li>
<li>the checkbox for the sum-of-coefficients dummy observations prior, such that the prior is switched on (see Section 3.3.2 for a short discussion of this prior)</li>
<li>the prior parameter <code>tau</code>, which controls the shrinkage of the sum-of-coefficients prior, is set to 1</li>
<li>the <b>Estimation</b> option, which has been set to impulse response functions (IRFs)</li>
<li>the recursive (Cholesky) <b>Identification of shocks</b> (rather than the generalised option)</li>
<li>the <b>Number of horizons</b> for the IRFs, set to 48 months (or four years)</li>
<li>the <b>Number of Monte Carlo (MC) draws</b>, equal to 100</li>
<li>the percentage of the probability distribution covered by the confidence bands, which is 0.68 or 68 per cent</li>
<li>the automatic suffix applied to any variable’s forecast name, equal to <code>_f</code></li>
<li>the number of forecast horizons, equal to 12 months (or one year)</li>
<li>the <code>phi</code> parameter for Pandemic Prior dummy observations: 0.001 (Pandemic Priors only)</li>
<li>the <code>epsilon</code> parameter for Pandemic Prior dummy observations: 0.00001 (Pandemic Priors only)</li>
</br>
</ul>
We note in passing that the <b>lbvar</b> add-in automatically imports the sample size from the current work file. If the current settings are not appropriate, we need to change them either manually in the dialog box or by including the respectively adjusted setting in the command window or EViews command code.</br></br>
We start by replicating the unconditional twelve-month forecasts in Figure 5. More specifically, we first estimate the BVAR using the BGR approach, that does not account for the structural break induced by the COVID-19 pandemic and assumes unchanged parameter values throughout. Following Cascaldi-Garcia (2022), we include $ p = 12 $ lags and set the fixed overall tightness $ \lambda = 0.2 $ and $ \tau = 10 \lambda = 10 \times 0.2 = 2 $. The completed dialog box for this scenario should be as follows.</br></br>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image5.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image5.png" title="Image 5" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>Figure 5: BGR options</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
After clicking <b>OK</b>, we should find eight new series in our workfile, which are the original series with an <code>_fbgr</code> suffix after their name, where <code>_fbgr</code> denotes the forecast using the BGR approach, i.e., without the Pandemic Priors. This is, unfortunately, the only output associated with forecasting that the <b>lbvar</b> add-in will give us. In other words, we do not get any uncertainty bands around the forecast as in Figure 4 (Cascaldi-Garcia).</br></br>
Instead of the dialog box, we can convert the above commands into a few lines of EViews code to do the same thing. Taking our initial dialog box above as our starting point, we only need to change a few things. The general form of the <b>lbvar</b> EViews command code is:</br></br>
<center>
<code>
lbvar(options) lags rw_prior impulse_variable @ endogenous variables
</code>
</br></br>
</center>
where the options included in the brackets associated with the command, i.e., <code>lbvar(options)</code>, correspond to one or more of those listed in Table 1; <code>lags</code> denotes the number of lags, $ p $, in the $ \text{BVAR}(p) $ model; <code>rw_prior</code> is the name of the matrix (vector) holding the prior means for the endogenous variables; <code>impulse_variable</code> in the context of impulse response functions is pretty self-explanatory and <code>endogenous_variables</code> is the list of $ n $ endogenous variables in the BVAR. More specifically, to replicate the above setting from the dialog box, the command line will be:</br></br>
<center>
<code>
lbvar(estimate=2, fhorizon=12, sum=1, lambda=0.2, tau=2, sample="1975m1 2022m3", pand=0, suffix=_fbgr) 12 irw_s ebp @ ebp sp500l fedfunds pcel pcepil payemsl indprol unrate
</code>
</br></br>
</center>
We can see several options from Table 1 appearing in parentheses after lbvar: <code>estimate=2</code> selects the forecasting rather than the IRF option; <code>fhorizon=12</code> sets the forecast horizon of twelve months; <code>sum=1</code> activates the sum-of-coefficients dummy observations prior from equation (77); <code>lambda=0.2</code> and <code>tau=2</code> define those two parameters; <code>sample=”1975m1 2022m3”</code> defines the sample size and corresponds to the Sample size window in the lbvar dialog window; <code>pand=0</code> switches off the Pandemic Priors; <code>suffix=_fbgr</code> adds the suffix <code>_fbgr</code> to the forecast variables; <code>12</code> defines the lag length, $ p $ (equivalent to the Number of lags box in the dialog window), of the $ \text{BVAR}(p) $; <code>irw_s</code> denotes the matrix (vector) holding the prior means for the endogenous variables; <code>ebp</code> is the impulse of impulse response functions is pretty self-explanatory and the entries after <code>@</code> are the eight monthly endogenous variables in the BVAR. Note that we have asked EViews to store the forecasts coming from BGR as <code>variable_name_fbgr</code>.</br></br>
Either of these two commands will generate the forecasts from the BGR model shown in brown in Figure 6.</br></br>
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image6.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image6.png" title="Image 6" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>Figure 6: Unconditional twelve-month-ahead forecasts as of March 2022 using the BGR and Pandemic Prior approaches
</small>
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>Notes: The green lines denote one year of historical data for the respective endogenous variables in the BVAR. The blue line denotes the twelve-month-ahead unconditional forecasts accounting for the COVID-19 pandemic with the Pandemic Priors, while the brown line shows the twelve-month-ahead unconditional forecasts assuming that the coefficient estimates remain unchanged over the estimation sample.
</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
We will defer a discussion of the results until we have generated all the relevant output, that is, both sets of forecasts. The next step will therefore be to generate the unconditional twelve-months forecasts using the Pandemic Priors. Again, we can use either the <b>lbvar</b> dialog box or a few lines of EViews code for this. Let us start with the dialog box, as we can retain all the above settings. The main difference is that the <b>Include Pandemic Priors</b> checkbox comes into play. We use the same options that were employed above for the BGR approach, including setting a lag length of $ p = 12 $ lags and fixing the overall tightness $ \lambda = 0.2 $ and $ \tau = 10 × \lambda = 2 $. The main difference is that we check the Include Pandemic Priors checkbox.</br></br>
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image7.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image7.png" title="Image 7" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>Figure 7: LBVAR Dialog options with pandemic priors.
</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
Note that we have asked EViews to store the forecasts coming from Cascaldi-Garcia’s (2022) model as <code>variable_name_fpp</code> Selecting the option to <b>Include Pandemic Priors</b> activates an additional dialog box for the dummies and the two hyper-parameters ($ \phi $ and $ \epsilon $) associated with them. This box appears after you click on <b>OK</b>. Some of the fields are preset, such as the number of COVID periods (0) and the phi ($ \phi $) and epsilon ($ \epsilon $) priors, which are equal to 0.001 and 0.00001 respectively. For information, the equivalent settings in Cascaldi-Garcia’s (2022) MATLAB code are 0.001 for both. In other words, when you use the below dialog window, please change the entry for the Epsilon prior from 0.00001 to 0.001. In other words, both $ \epsilon $ in equation (6) and $ \phi $ in equation (6) are set to 0.001; See Figure 8 below. </br></br>
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image8.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image8.png" title="Image 8" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>
Figure 8: LBVAR Dialog options for COVID dummies
</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
Cascaldi-Garcia (2022, p. 6) reports that the COVID-19 pandemic period is modelled by applying the Pandemic Priors from March 2020 to August 2020, such that we include $ h = 6 $ individual dummies. We therefore enter the number of COVID periods to dummy out, equal to six, and the associated six dummy variables, which we have called <code>dum1, dum2, dum3, dum4, dum5</code> and <code>dum6</code>. As highlighted above, these six time dummies build on the assumption that the COVID-19 shock is akin to intercept shifts for the macroeconomic variables in the selected period (from March to August 2020).</br></br>
After clicking <b>OK</b> in the second dialog box, the workfile will include a second set of forecasts for the eight endogenous variables, all of which can be identified by the suffix <code>_fpp</code>. These forecasts appear as the blue lines in Figure 3 above.</br></br>
As before, instead of the dialog box, we can convert the above commands into a few lines of EViews code to do the same thing. Taking our initial dialog box above as our starting point, we only need to change a few things.</br></br>
<center>
<code>
lbvar(estimate=1, sum=1, lambda=0.2, tau=2, sample="1975m1 2022m3", pand=1, dummy="dum1 dum2 dum3 dum4 dum5 dum6", covper=6, eps=0.001) 12 irw_s ebp @ ebp sp500l fedfunds pcel pcepil payemsl indprol unrate
</code>
</br></br>
</center>
The next few options (<code>estimate=1, sum=1, lambda=0.2, tau=2, sample=“1975m1 2022m3”</code>) are equivalent to the BGR case. The first difference occurs with the <code>pand</code> command. Remember that setting <code>pand=0</code> switches the Pandemic Priors off and results in the BGR estimation, while setting <code>pand=1</code> switches them on. We now have <code>pand=1</code>, which corresponds to including the Pandemic Priors. As mentioned above, the equivalent in the dialog window is the <b>Include Pandemic Priors</b> checkbox. As we can see, the Pandemic Priors have a few more options, starting with <code>dummy = “dum1 dum2 dum3 dum4 dum5 dum6”</code>. This informs the <b>lbvar</b> add-in that we will be using six time dummies. This command also tells EViews what the names of the time dummy variables in the workfile are. In our case, they are <code>dum1, dum2, dum3, dum4, dum5, dum6</code>. You can give the time dummies any name, as long as they are consistent across the workfile and the <code>dummy = ""</code> command. Cascaldi-Garcia (2022, p. 6) reports that the COVID-19 pandemic period is modelled by applying the Pandemic Priors from March 2020 to August 2020, such that we include <code>h=6</code> individual dummies, which translates into <code>covid_periods=6</code> statement, defining the number of COVID-19 periods to dummy out. The next optional command, <code>covper=6</code>, therefore specifies that the COVID period is set to six periods, i.e., the extraordinary period of extreme observations lasts for six months. This command is equivalent to the Number of Covid periods box in the second dialog window that is specific to the Pandemic Priors. The final command, <code>eps = 0.001</code>, specifies the value of epsilon ($ \epsilon $) in equation (6) to be the same as $ \phi $ in equation (6).</br></br>
Now that we have generated two sets of forecasts, one incorporating the Pandemic Priors and one that does not, we are in a position to inspect the results in more detail. Looking at Figure 4 of Cascaldi-Garcia (2022) in isolation, we find that for most of the variables, the two forecasts are rather – if not very – similar. But there are notable deviations, namely for the two labour variables (employment and the unemployment rate) and, to a lesser degree, PCE as well industrial production. On the other hand, variables with unchanged autoregressive coefficients such as EBP, the S&P 500 stock market index and the PCE price index, display very similar unconditional twelve-month ahead forecasts across the two estimation approaches. This is apparent in both Figure 4 (Cascaldi-Garcia 2022) and Figure 3. In contrast, variables that are markedly affected by aberrant observations, such as employment and the unemployment rate, present substantially different unconditional forecasts, implying different economic interpretations. Forecasts using the BGR model in Figure 4 (Cascaldi-Garcia) and Figure 3 indicate that employment is expected to increase for a couple of months after which it levels off for four months and starts to decrease thereafter. Similarly, the unemployment forecast using the BGR approach falls slightly for two months before increasing over the remainder of the forecast horizon. Using the Pandemic Priors results in very different forecasts: employment increases steadily for ten months before levelling off, and the unemployment rate falls for seven months before increasing again, albeit to a lower level at the end of the forecast horizon than under the BGR scenario.</br></br>
Comparing Figure 6 with Figure 4 of Cascaldi-Garcia (2022) gives us an indication how well the <b>lbvar</b> add-in replicates the forecasting results in Cascaldi-Garcia (2022). Overall, the correspondence is very satisfactory, including the kinks in the BGR forecasts for employment, industrial production and the unemployment rate.
Finally, we can look at the impact and propagation of a one-off structural shock in the model. The next task therefore involves generating the IRFs for the BGR and Pandemic Prior approaches and replicating Figure 5 of Cascaldi-Garcia (2022). Towards that end, Figure 5 presents the impulse response functions (IRFs) of a – one-off – one standard deviation EBP shock in March 2020 on the remaining variables using the BGR and Pandemic Prior approaches. Solid black lines indicate the (posterior mean) responses using the Pandemic Priors and solid red lines those calculated using the BGR approach.</br></br>
For the structural decomposition and the identification of the structural EBP shocks, Cascaldi-Garcia (2022) orders the EBP variable first in the BVAR and performs a standard Cholesky (or recursive) decomposition. We should note, though, that the approach is flexible enough to accommodate other conventional as well as state-of-the-art identification procedures, such as proxy VARs, sign restrictions, external instruments or maximisation of the variance decomposition.</br></br>
Again, we have the choice between doing so via the dialog box or a few lines of EViews code.</br></br>
<center>
<code>
lbvar(estimate=1, horizon=12, sum=1, lambda=0.2, tau=2, sample="1975m1 2022m3", ident=1, pand=0) 12 irw_s ebp @ ebp sp500l fedfunds pcel pcepil payemsl indprol unrate
</code>
</br></br>
</center>
After running the code, EViews produces the impulse response functions of the one standard deviation shock in EBP (the impulse variable) on the variables in the BVAR (including EBP itself), reproduced below as Figure 9. We find a dark blue line, which is the posterior mean of the 100 draws, with confidence bands covering 68 per cent of possible outcomes. Once again, we will defer a discussion of the results until we have generated both sets of IRFs.</br></br>
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image9.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image9.png" title="Image 9" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>
Figure 9: Impulse response functions to a one standard deviation EBP shock (BGR approach)
</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
The impulse_variable comes into play when we are interested in first identifying and then estimating a VAR model subject to a structural shock and its associated output, such as impulse response functions (IRFs), forecast error variance decompositions (FEVDs, or variance decompositions in short) and historical decompositions. In Cascaldi-Garcia (2022, Figure 5), the author shows estimated impulse responses to a one standard deviation EBP shock. This makes <code>ebp</code> the <code>impulse_response</code> variable, which is reflected in the above commands. The equivalent entry in the <b>lbvar</b> add-in dialog window is <b>Impulse variable</b> at the bottom of the right-hand side. If you wanted the structural shock to come from another variable, simply replace <code>ebp</code> by the variable of choice. Finally, the endogenous variables of the model then appear after the <code>@</code> in the command line. As already mentioned, we have eight variables in our underlying model: <code>ebp, sp500l, fedfunds, pcel, pcepil, payemsl, indprol unrate</code>. These variables are specified in the first box, called <b>Endogenous variables</b>, in the <b>lbvar</b> add-in dialog window.</br></br>
The analogous exercise with Pandemic Priors should also be straightforward to set up. We only need to change the <b>Estimation</b> drop-down menu to <b>Impulse response</b>. Alternatively, we can rely on the code below.</br></br>
<center>
<code>
lbvar(estimate=1, sum=1, lambda=0.2, tau=2, sample="1975m1 2022m3", pand=1, dummy="dum1 dum2 dum3 dum4 dum5 dum6", covper=6, eps=0.001) 12 irw_s ebp @ ebp sp500l fedfunds pcel pcepil payemsl indprol unrate
</code>
</br></br>
</center>
After running the code, EViews produces the impulse response functions of the one standard deviation shock in EBP (the impulse variable) on the variables in the BVAR (including EBP itself), reproduced below as Figure 10. We find a dark blue line, which is the posterior mean of the 100 draws, with confidence bands covering 68 per cent of possible outcomes.</br></br>
<!-- :::::::::: FIGURE 10 :::::::::: -->
<center>
<table>
<tr>
<td class='nb'>
<center>
<a href="http://www.eviews.com/blog/lbvar/images/image10.png">
<img height="auto" src="http://www.eviews.com/blog/lbvar/images/image10.png" title="Image 10" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class='nb'>
<center>
<small>
Figure 10: Impulse response functions to a one standard deviation EBP shock (Pandemic Prior approach)
</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 10 :::::::::: -->
The blue shaded areas in Figure 9 should be compared to the grey shaded areas centred around the black lines in Figure 5 of Cascaldi-Garcia. Similarly, the blue shaded areas in Figure 10 should be compared to the bands centred around the red lines traced out by the red dotted lines in Figure 5 of Cascaldi-Garcia. As before, we find that the IRFs to a one standard deviation structural EBP shock across the two estimation approaches more or less coincide for half of the variables. The four variables that display the highest degree of aberrant observations once again show the greatest degree of divergence between the BGR approach that keeps the coefficients constant over the pandemic period and the Pandemic Priors that do not. The notable exceptions occur for PCE, industrial production, employment and the unemployment rate. In fact, the IRFs for the latter two on the basis of the BGR methodology show notable kinks after two periods which are quite distinct from the equivalent IRFs using the Pandemic Priors.</br></br>
Figures 9 and 10 show sizeable differences in both size and propagation. Using the BGR approach, we would expect both quicker and larger falls in PCE, industrial production and employment and a much quicker and larger jump in the unemployment rate. Accounting for the intercept shifts results in a much smoother and more delayed impact of the structural EBP shock. For the remaining four variables, the economic effects of an EBP shock are similar across the two different estimation approaches.</br></br></br></br>
<h1 class="seccol", id="sec5">Concluding Remarks</h1>
Looking at Figures 9 and 10, we wonder whether the <b>lbvar</b> add-in really does a one standard deviation shock in the impulse variable, as the EBP IRF in both figures starts at one. Our expectation of a one standard deviation shock would have been a starting point closer to the unconditional standard deviation of the <code>ebp</code> variable, which is 0.55. In fact, the equivalent intercept value for the IRF in Figure 10 is closer to 0.23. This may indicate that the <b>lbvar</b> add-in does a unit shock instead. This is not the end of the world, as both methods are widely employed in the literature. But it does mean that we can only compare the shapes of the IRF across figures, and not their magnitude.</br></br></br></br>
<!-- <hr />
<h3 class="seccol", id="sec6">Files</h3>
<ul>
<li><a href="nowcasting_using_fa_midas.prg"'><b class="wf">NOWCASTING_USING_FA_MIDAS.PRG</b></a></li>
</ul><br /><br /> -->
<hr />
<h3 class="seccol", id="sec7">References</h3>
<ol class="bib2xhtml">
<li id="banbura_2010">
Bańbura, M., Giannone, D. and Reichlin, L. (2010). Large Bayesian vector autoregressions. <cite>Journal of Applied Econometrics</cite>, 25(1): 71–92.
</li>
<li id="cascaldi-garcia_2022">
Cascaldi-Garcia, D. (2022). Pandemic priors. <cite>Board of Governors of the Federal Reserve System, International Finance Discussion Papers</cite>, 1352.
</li>
<li id="carriero_2022">
Carriero, A., Clark, T. E., Marcellino, M. and Mertens, E. (2022). Addressing COVID-19 outliers in BVARs with stochastic volatility. <cite>Review of Economics and Statistics</cite>, 1–38.
</li>
<li id="christiano_1996">
Christiano L. J., Eichenbaum M., Evans C. (1996). The effects of monetary policy shocks: evidence from the Flow of Funds. <cite>Review of Economics and Statistics</cite>, 78(1): 16–34.
</li>
<li id="gilchrist_2012">
Gilchrist, S. and Zakrajšek, E. (2012). Credit spreads and business cycle fluctuations. <cite>American Economic Review</cite>, 102(4): 1992–1720.
</li>
<li id="lenza_2022">
Lenza, M. and Primiceri, G. E. (2022). How to estimate a VAR after March 2020. <cite>Journal of Applied Econometrics</cite>, 37(4): 688–699.
</li>
<li id="litterman_1986">
Litterman, R B (1986). Forecasting with Bayesian vector autoregressions – five years of experience. <cite>Journal of Business & Economic Statistics</cite>, 4(1): 25–38.
</li>
<li id="primiceri_2020">
Primiceri, G. E. and Tambalotti, A. (2020). Macroeconomic forecasting in the time of COVID-19</cite>, mimeo.
</li>
<li id="schorfheide_2021">
Schorfheide, F. and Song, D. (2021). Real-time forecasting with a (standard) mixed-frequency VAR during a pandemic. <cite>NBER Working Paper</cite>, 29535.
</li>
<li id="wu_2016">
Wu, J. C. and Xia, F. D. (2016). Measuring the macroeconomic impact of monetary policy at the zero lower bound. <cite>Journal of Money, Credit and Banking</cite>, 48(2-3): 253–291.
</li>
</ol>
</span>
IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-84945073601817520372023-09-27T12:30:00.000-07:002023-09-27T12:30:05.440-07:00Principal Component Analysis for Nonstationary Series<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Eren Ocakverdi</i><br /><br />
This blog piece intends to introduce a new add-in (i.e. <a href='https://www.eviews.com/Addins/hxprincomp.aipz'>HXPRINCOMP</a>) that implements the procedure developed by Hamilton and Xi (2022).
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Principal components analysis on cyclical component</a>
<li><a href="#sec3">Application to U.S. Treasury Yields</a>
<li><a href="#sec4">Application to large macroeconomic data sets</a>
<li><a href="#sec5">Files</a>
<li><a href="#sec6">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction</h3>
In their paper, Hamilton and Xi (2022) propose a novel methodology when the goal is to extract the common factors behind the cyclical components of each of the series studied. They argue that focusing on the cyclical component of a time series offers a practical advantage; namely, it can be consistently estimated using an OLS regression while remaining agnostic about the stationarity of the underlying series.<br/><br/><br/><br/>
<h3 class="seccol", id="sec2">Principal components analysis on cyclical component</h3>
The procedure starts with estimating the following OLS regression for every variable:
$$
y_{it} = \alpha_{i0} + \alpha_{i1} \cdot y_{i,t-h} + \alpha_{i2} \cdot y_{i,t-h-1} + \cdots + \alpha_{ip} \cdot y_{i,t-h-p+1} + c_{it}
$$
Here, $(h = 8)$ and $(p = 4)$ for quarterly data and $(h = 24)$ and $(p = 12)$ for monthly data. Authors postulate that true cyclical components, $( C_t = (c_{1t}, c_{2t}, \ldots, c_{Nt})^top )$, are characterized by a factor structure $( r \ll N )$ of the form:
$$
\underbrace{\mathbf{C}_t}_{(N \times 1)} = \underbrace{\Lambda}_{(N \times r)} \cdot \underbrace{\mathbf{F}_t}_{(r \times 1)} + \underbrace{\mathbf{e}_t}_{(N \times 1)}
$$
Authors also show that even if the cyclical components are not observed and are therefore estimated $( \hat{c}_{it} = c_{it} + \upsilon_{it} )$, true factors can still be consistently estimated under certain conditions.<br/><br/><br/><br/>
<h3 class="seccol", id="sec3">Application to U.S. Treasury Yields</h3>
As a first example, authors apply their method to treasury yields with different maturities (see Figure 1).<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/hxprincomp/images/image1.png"><img height="auto"
src="http://www.eviews.com/blog/hxprincomp/images/image1.png" title="Yields on different maturities."
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Yields on different maturities.</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
The downward trend in raw yields data is obvious, but authors prefer not to apply any transformation to make the series stationary. To run the procedure on yields data, we can use the add-in (see Figure 2).<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/hxprincomp/images/image2.png"><img height="auto"
src="http://www.eviews.com/blog/hxprincomp/images/image2.png" title="GUI of the add-in for yields example"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: GUI of the add-in for yields example</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
Input parameters are set to match that of original study. Four principal components are extracted. However, loading factors are of main interest for this particular exercise as they are the key parameters that summarize the dynamics of yield curve (see Figure 3).<br /><br />
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/hxprincomp/images/image3.png"><img height="auto"
src="http://www.eviews.com/blog/hxprincomp/images/image3.png" title="Factor loadings for the cyclical data of yields"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3. Factor loadings for the cyclical data of yields</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
The coefficient relating yields to the first factor is called the level factor and is more-or-less the same for all maturities. Loading on the second factor is called slope and is positive for long rates, but negative for short rates. The third factor is called curvature and has a negative weight for bonds with very short or very long maturity.<br /><br /><br /><br />
<h3 class="seccol", id="sec4">Application to large macroeconomic data sets</h3>
When using principal components analysis on large macroeconomic data sets, one may need to transform each of the variables to ensure stationarity. Since it is done individually, it could be a tedious task. Extracting the cyclical component of series solves this problem by design.</br></br>
As a second example, authors apply their methodology to a large macroeconomic data set (2022-4 vintage of FRED-MD database), which covers 127 variables. To run the procedure on macroeconomic data, once again we can use the add-in (see Figure 4).</br></br>
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/hxprincomp/images/image4.png"><img height="auto"
src="http://www.eviews.com/blog/hxprincomp/images/image4.png" title="GUI of the add-in for FRED example"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4. GUI of the add-in for FRED example</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
In order to deal with missing values in the data set a balanced sample is used. Eight principal components are extracted and the first two are depicted in Figure 5 below.
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/hxprincomp/images/image5.png"><img height="auto"
src="http://www.eviews.com/blog/hxprincomp/images/image5.png" title="First and second PC of cyclical components of FRED-MD variables."
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5. First and second PC of cyclical components of FRED-MD variables.</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
Authors argue that their series correctly summarizes cyclical movements not only in early periods, but especially during 2020. They find that while the first factor captures the real economic conditions, the second factor is mainly related to nominal prices and interest rates. Please note that the procedure does not require any stationarity corrections for the series or special treatment for outliers!</br></br></br></br>
<hr />
<h3 class="seccol", id="sec5">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/hxprincomp/workfiles/hxprincomp_example_yield.prg"'><b class="wf">HXPRINCOMP_EXAMPLE_YIELD.PRG</b></a></li>
<li><a href="http://www.eviews.com/blog/hxprincomp/workfiles/hxprincomp_example_fred.prg"'><b class="wf">HXPRINCOMP_EXAMPLE_FRED.PRG</b></a></li>
<li><a href="http://www.eviews.com/blog/hxprincomp/workfiles/Yield_2022.xlsx"'><b class="wf">YIELD_2022.XSLX</b></a></li>
<li><a href="http://www.eviews.com/blog/hxprincomp/workfiles/2022-04.csv"'><b class="wf">2022-04.CSV</b></a></li>
</ul><br /><br />
<hr />
<h3 class="seccol", id="sec6">References</h3>
<ol class="bib2xhtml">
<li id="enders-2004">
Hamilton, J. D., and Xi, J. (2022), <i>Principal Component Analysis for Nonstationary Series</i>, Working Paper, UC San Diego.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com1tag:blogger.com,1999:blog-6883247404678549489.post-39332565922250823982023-09-11T09:51:00.003-07:002023-09-20T09:25:03.866-07:00Nowcasting US GDP During Covid-19 using Factor Augmented MIDAS<style>
/* table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
} */
/* td {
border: 1px solid black;
} */
td.bold {
font-weight: bold;
}
td.btop {border-top: 1px solid black}
td.bbot {border-bottom: 1px solid black}
td.bleft {border-left: 1px solid black}
td.bright {border-right: 1px solid black}
td.center {text-align:center}
td.left {text-align:left}
td.right {text-align:right}
td.bottom {vertical-align:bottom}
td.underline {text-decoration:underline}
td.strikeout {text-decoration:line-through}
td.indent1 {text-indent:1}
hr.width300 {width: 100%}
hr.black {color:#000000}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.wfvar {
font-weight: bold;
text-transform: uppercase;
}
.wf {
font-weight: bold;
text-transform: uppercase;
}
.subseccol {
color: #fa5e5e
}
.bold {
font-weight: 400;
}
.col_blue {
color: rgba(41, 61, 92, 1)
}
.col_red {
color: rgba(250, 94, 94, 1)
}
.col_green {
color: rgba(0, 200, 125, 1)
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["HTML.js", "AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<style="font-family: "verdana" sans-serif">
The COVID-19 pandemic sent waves through the global economy, triggering a macroeconomic shock and caused unprecedented challenges for economists trying to predict the current state of economies.</br></br>
In the quest for a more timely and accurate assessment of economic conditions during the COVID-19 era, economists and researchers turned to innovative solutions, and one of the most promising techniques emerged: MIDAS (Mixed-Data Sampling) estimation.<a name='more'></a><br /><br />
MIDAS, originally developed in the early 2000s, has gained attention as a powerful tool to nowcast GDP with a higher frequency, enabling more informed and timely decision-making.</br></br>
We have covered nowcasting and MIDAS with EViews before on this blog. We’ve <a href="https://blog.eviews.com/2020/12/nowcasting-gdp-with-pmi-using-midas-gets.html">demonstrated</a> how the novel MIDAS-GETS approach can be used in conjunction with PMI data to accurately nowcast Eurozone GDP, and we’ve <a href="https://blog.eviews.com/2018/12/nowcasting-gdp-on-daily-basis.html">shown</a> how many daily series can be reduced down to a small set of variable using principle components and then MIDAS to nowcast Australian GDP.</br></br>
This blog post is similar to the latter post above – we will use a large number of high frequency variables to nowcast GDP through a combination of variable reduction and MIDAS estimation. Specifically, we will use the <a href="https://research.stlouisfed.org/econ/mccracken/fred-databases/">FRED-MD</a> monthly data bank of US macroeconomic variables to forecast US GDP, using a Factor Augmented MIDAS model.</br></br>
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Data</a>
<li><a href="#sec3">Nowcasting 2020Q2 GDP</a>
<li><a href="#sec4">Longer Term Nowcast Evaluation</a>
<li><a href="#sec5">Files</a>
<li><a href="#sec6">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction</h3>
An introduction to the background of MIDAS and its benefits can be found in our <a href="https://blog.eviews.com/2020/12/nowcasting-gdp-with-pmi-using-midas-gets.html#sec1">previous blog post</a>. In this post we’ll be performing Factor Augmented MIDAS (FA-MIDAS), which is an extension to the standard MIDAS technique. FA-MIDAS was introduced in <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0084.2010.00591.x">Marcellino and Schumacher (2010)</a>, and has been used in a number of different studies, including <a href="https://www.tcmb.gov.tr/wps/wcm/connect/ced9aff9-63a5-4192-b1a9-4aee30f252b6/wp2111.pdf?MOD=AJPERES&CACHEID=ROOTWORKSPACE-ced9aff9-63a5-4192-b1a9-4aee30f252b6-nFSUhEj">Gül and Kazdal (2021)</a>, and <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/twec.12708">Ferrara and Marsilli 2018</a>.</br></br>
One of the downsides of traditional MIDAS is that it is unable to handle large numbers of high-frequency regressors. Indeed, it is often recommended that only a single high-frequency regressor be used. With today’s abundance of data, economists are faced with a large set of high-frequency regressors to choose from, and reducing the number of variables down to a single or small number of variables is a daunting task.</br></br>
Factor analysis is able to reduce the dimensionality of the regressors by identifying correlations amongst them and use those correlations to create a small set of latent factors which contain similar information to the set of original variable.</br></br>
The FA-MIDAS approach is then to first use factor analysis to reduce the large number of high-frequency variables to a handful of latent factors, and then use those high-frequency factors as regressors in a MIDAS regression to model a lower frequency variable.</br></br></br></br>
<h3 class="seccol", id="sec2">Data</h3>
The Saint Louis Federal Reserve maintains a large database of monthly US macroeconomic variables, <a href="https://research.stlouisfed.org/econ/mccracken/fred-databases/">FRED-MD</a>. The database contains 127 variables that are updated each month and made available in a single .CSV file. Archival versions of the database are also made available, meaning you can download the data as released during a specific month (i.e. not containing any revisions made since that date). The database also contains an appendix that specifies a suitable transformation that should be performed on each series prior to use in analysis. The transformations include first and second differences, logs, and first and second log differences, as well as simply no transformation.</br></br>
As well as using this database, we will access FRED’s quarterly US GDP data, which can also be retrieved on an archival basis – using the values that were available on a certain date in history.</br></br></br></br>
<h3 class="seccol", id="sec3">Nowcasting 2020Q2 GDP</h3>
We will imagine we are in June 2020, a few months after the initial surge in Covid numbers in the United States. This is the last month of the second quarter of 2020, and as such we would not have any official data on GDP for that quarter yet. However, we would have US data from the FRED-MD database up until May 2020. That means we have two months of macro-economic data post Covid-19 shutdowns that started in March 2020, yet no data for GDP itself during Covid.</br></br>
We’ll walk through the steps taken to produce a nowcast of GDP based upon the monthly data.</br></br>
To begin we will instruct EViews to download the FRED-MD database for June 2020. We can download this file manually from the <a href="https://research.stlouisfed.org/econ/mccracken/fred-databases/">FRED-MD</a> website:</br></br>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image1.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image1.jpg" title="Summary of FRED-MD Data" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Summary of FRED-MD Data</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
However, since the database is a simple .CSV file, we can instruct EViews to open the file directly from the internet with a <b>wfopen</b> command:</br></br>
<pre>wfopen https://files.stlouisfed.org/files/htdocs/fred-md/monthly/2020-06.csv</pre>
We simply follow wfopen with the url of the file, and then add two arguments to describe the data – the first tells EViews that there are two rows of headers at the top of the file (the name of the series, and the transformation), and that of those two rows of headers, the name is in the first row, followed by series attributes (in our case the transformation).</br></br>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image2.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image2.jpg" title="Workfile (Summary)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Workfile (Summary)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
The one issue with this import is that although the first column in the CSV file, sasdate, contains dates, EViews did not recognize the file as being dated. This is because the CSV file contains a row of blank information at the end, and EViews will not recognize the blank. The issue is easily rectified by clicking on <b>Proc->Structure/Resize Current Page</b>, and then changing the <b>Workfile structure type</b> to <b>Dated – specified by date series</b> and entering <b>sasdate</b> as the <b>Date series</b>:</br></br>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image3.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image3.jpg" title="Workfile (Open)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Workfile (Open)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
EViews will then warn us about removing one observation from the workfile (the blank row), but after confirming that’s what we want to do, we end up with a nicely structured monthly workfile containing all 127 variables between 1959 and May 2020.</br></br>
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image4.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image4.jpg" title="Workfile (Monthly)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: Workfile (Monthly)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Since we imported the Transformation row of the CSV file as an attribute (with the <pre>namepos=firstatt</pre> argument to the <pre>wfopen</pre> command), each series also contains meta data on the type of transformation recommended. We can view these by using the <b>Details +/-</b> button on the workfile, and then adding the transformation column by right clicking on any column header and selecting <b>Edit columns</b>.</br></br>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image5.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image5.jpg" title="Workfile (Transform)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: DWorkfile (Transform)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
There is no point-and-click method in EViews to automatically apply the transformations to all the series at once. However, we can make a simple program that loops through each series, pulling the transformation type from its attributes, and then applying the transformation to itself:</br></br>
<pre>
'perform transformations
%serlist = @wlookup("*", "series")
for %j {%serlist}
%tform = {%j}.@attr("Transform:")
if @len(%tform) then
if %tform="1" then
series temp = {%j} 'no transform
endif
if %tform="2" then
series temp = d({%j}) 'first difference
endif
if %tform="3" then
series temp = d({%j},2) 'second difference
endif
if %tform="4" then
series temp = log({%j}) 'log
endif
if %tform= "5" then
series temp = dlog({%j}) 'log difference
endif
if %tform= "6" then
series temp = dlog({%j},2) 'log second difference
endif
if %tform= "7" then
series temp = d({%j}/{%j}(-1) -1) 'other
endif
{%j} = temp
d temp
endif
next
</pre>
</br></br>
Some of the series only are missing some data over the last year. We will want to drop these from our analysis – we’d prefer to only use series with completely up-to-date data. We’ll make another quick loop to add series to a group based on whether they have an observation for the last year or not.</br></br>
<pre>
%serlist = @wlookup("*", "series") 'get list of series
smpl @last-11 @last 'set sample to last year of observations
group g 'declare a group
for %j {%serlist} 'loop through series
if @obs({%j})=12 then 'if series has values for every observation in last year
g.add {%j} 'add it to group
endif
next
smpl @all 'reset sample to everything
</pre>
</br></br>
Now we are ready to perform the factor analysis on our group. We can do this by opening the group we created, <b>G</b>, and then clicking on <b>Proc->Make Factor</b> to bring up the Factor Specification dialog. There are lots and lots of options that can be specified when performing Factor Analysis in EViews, but we’ll keep most of them at their default values. The only change we will make is changing the <b>Number of factors</b> option to use the <b>Ahn and Horenstein</b> methods (this method tends to result in fewer factors than other methods, which is useful when performing MIDAS estimation).</br></br>
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image6.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image6.jpg" title="Dialog (Factor Specification)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: Dialog (Factor Specification)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
In this case, the analysis resulted in a single factor being created. We can output this factor as a series into the workfile by clicking on <b>Proc->Make Scores</b>, and then clicking <b>OK</b>.</br></br>
This produces a new series, <b>F1</b> in our workfile, which is the series we will use as the high-frequency regressor in the MIDAS estimation.</br></br>
Before we move on to working with the low-frequency data, we’ll quickly give our monthly page a more descriptive name than the default “Untitled”, by right clicking on the page tab, selecting <b>Rename Workfile Page…</b> and then entering <b>Monthly</b> as the new name.</br></br>
To set up our quarterly GDP data, we click on the New Page tab and select Specify by Frequency/Range… We’ll then select a Quarterly frequency, and change the start date to 1992 (although we have data for our monthly variables before this date, we’ll cut down the amount of data actually used in estimation). We’ll call the page “Quarterly”.</br></br>
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image7.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image7.jpg" title="Dialog (Workfile Create)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7: Dialog (Workfile Create)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
Once the page has been created, we’ll open the FRED database (<b>File->Open->Database->FRED</b>), browse and search for GDP, change the <b>As Of</b>: date to 2020-06-01, and drag the Real GDP series into our workfile. This series contains Real US GDP data as it was available in June 2020 (not as it is available today). EViews will ask if we want to change the name (since the source name is illegal in EViews). We’ll change it to <b>GDP</b>.</br></br>
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image8.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image8.jpg" title="Workfile (Quarterly)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: Workfile (Quarterly)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
If we open the GDP series we can see the final value, for 2022Q2, is an NA – that value of GDP had not been released yet on June 1st 2020.</br></br>
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image9.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image9.jpg" title="Series (GDP)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9: Series (GDP)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
We’re now ready to perform our MIDAS estimation, which we do by clicking on <b>Quick->Estimate Equation</b>, and then change the Method dropdown to <b>MIDAS</b>. It is common for models of GDP to use percent change of GDP as the dependent variable, with a constant and a single lag of percent change of GDP as quarterly regressors. We can compute the percent change using the <pre>@pch</pre> function in EViews.</br></br>
The specification of the high-frequency regressor requires a little thought. We wish to use the monthly series <b>F1</b> (which was the factor series we created earlier) as our high-frequency regressor. We have data on F1 until May 2020, which is the second month of Q2 2020. When converting between high frequency and lower frequency data during MIDAS estimation, EViews will, by default, use the last observation in the quarter, and work backwards in time from there. In our case that would be June 2020, but in the monthly page we created, this month doesn’t exist (the monthly page ends in May 2020). The easiest way to fix this is to change the <b>Frequency conversion</b> setting of the MIDAS estimation on the <b>Options</b> tab of the estimation dialog to <b>First</b>. Now EViews will use data from the first month in the quarter instead of the last.</br></br>
This will enable us to produce an estimation. However, we would actually be losing some information – we would use data from April 2020 (April is the first month of the quarter) and earlier, and drop the information in May 2020. We can alleviate this by entering our monthly regressor as <b>monthly\F1(1)</b> where the (1) indicates to shift the data one month on from the first of the month. We’ll select to use 12 monthly lags (a full year) of the F1 series.</br></br>
This means, for example, that the GDP data for 2019Q3 would be explained by GDP data for 2019Q2 (the one period lag in GDP), a constant, and monthly data for F1 from September 2018 through August 2019 (the second month of 2019Q3).</br></br>
<!-- :::::::::: FIGURE 10 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image10.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image10.jpg" title="Dialog (Equation Estimation)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 10: Dialog (Equation Estimation)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 10 :::::::::: -->
<!-- :::::::::: FIGURE 11 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image11.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image11.jpg" title="Dialog (MIDAS Frequency Conversion)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 11: Dialog (MIDAS Frequency Conversion)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 11 :::::::::: -->
The results of the estimation are:</br></br>
<!-- :::::::::: FIGURE 12 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image12.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image12.jpg" title="Estimation Results" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 12: Estimation Results</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 12 :::::::::: -->
We can see that the three MIDAS PDL coefficients are all statistically significant. Also note that EViews automatically adjusted the estimation sample to end in 2020Q1 (which is the last quarter for which we have GDP data).</br></br>
To nowcast 2020Q2 GDP, all we have to now do is click the Forecast button and set the forecast sample to 2020Q2 2020Q2 (same start date and end date means to forecast a single period).</br></br>
<!-- :::::::::: FIGURE 13 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image13.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image13.jpg" title="Dialog (Forecast)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 13: Dialog (Forecast)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 13 :::::::::: -->
Note that although the equation is specified in terms of percent-change of GDP, we will forecast the raw values of GDP, not the percent change, and the forecast values will be put into the series <b>GDPF</b>. Since we have the <b>Insert actuals for…</b> checkbox checked, the series will contain the actual values of GDP for the non-forecast periods (i.e. every quarter other than 2020Q2). After clicking <b>OK</b>, we can open the <b>GDPF</b> series as a graph:</br></br>
<!-- :::::::::: FIGURE 14 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image14.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image14.jpg" title="Nowcast GDP" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 14: Nowcast GDP</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 14 :::::::::: -->
We can see that the forecasted (shaded) area of the graph has a sharp decline in GDP, which matches the economic expectations.</br></br>
We can go further and actually gauge how good this nowcast of 2020Q2 GDP is by retrieving the actual values of GDP for that period. We do so by again opening the FRED database, and changing the <b>As of</b>: date to be August 2020, and then drag the GDP series back into EViews. We’ll keep the name as that suggested by EViews, to recognize that the data are as-of August 2020. We can open this series alongside the nowcasted series in a group and view the graph, using the graph slider to zoom into the last few quarters of the data:</br></br>
<!-- :::::::::: FIGURE 15 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image15.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image15.jpg" title="Nowcast GDP vs Actual" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 15: Nowcast GDP vs Actual</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 15 :::::::::: -->
We can see that the nowcast value (blue line) very closely matches the actual value (orange) for 2020Q2!</br></br></br></br>
<h3 class="seccol", id="sec4">Longer Term Nowcast Evaluation</h3>
In the previous section we walked through how we could nowcast a single quarter of GDP, and showed that the nowcasted value was very close to the first release of the actual data. As a single result, this doesn’t tell us conclusively that the nowcasting model is always an accurate predictor. For that we would need to perform a series of nowcasts over a longer period of time and compare the results from the series of nowcasts to the actual data.</br></br>
Performing such a study in EViews is relatively straightforward through the EViews programming language. We’ve written such a script that nowcasts GDP between January 2017 and July 2023. We won’t go through each step of the script, but will describe its functionality.</br></br>
<h3 class="subseccol">Data Retrieval</h3>
The program loops through each month between 2017 and today’s date. For each of those months, it downloads the FRED-MD file for that month into a new page in the workfile. Thus each month will have its own page containing FRED-MD data from 1991 until the month prior (since, for example, the FRED-MD file for 2020-06 contains data from 1991 until 2020-05).</br></br>
For each month in the loop the program will also download quarterly GDP from FRED to a quarterly page, where the GDP data are as of the first of the month (and so will contain data up until the quarter prior to the quarter of the current month, or maybe even the quarter before that, depending on how long the delay of the official release of the GDP data is).</br></br>
<h3 class="subseccol">Estimation</h3>
With the data retrieved, for each month in the loop, a factor model is estimated on the monthly FRED-MD dataset (having removed any series that do not contain data for the previous two years), and the estimated factors are outputted to the monthly page. Then, for the corresponding quarterly GDP as-of that month, a MIDAS model is estimated, with percent-change GDP as the dependent variable, a constant and a lag of percent-change GDP as regressors, and a Almon/PDL weighted MIDAS term using 12 lags of the factor series and a polynomial degree of 3. For each of the months, the frequency conversion is set to “first”, and the factor series are shifted forwards to allow capture of the most data (as was the case for the single estimation we performed earlier).</br></br>
At the same time a baseline comparison model of a simple AR(1) model for GDP is also estimated (i.e. simply percent-change GDP regressed against a constant and a lag).</br></br>
<h3 class="subseccol">Nowcasting</h3>
For both the MIDAS estimation and the baseline AR model, a one-period ahead nowcast is made, and the two values for that single quarter are stored.</br></br>
After the program has looped through every month, there will be three nowcasts for each quarter, for each of the two models. The first nowcast will correspond to data as-of the first month of the quarter, the second nowcast will correspond to data as-of the second month of the quarter, and the third nowcast will correspond to data as-of the third month of the quarter.</br></br>
These nowcasts are stored in a monthly page, giving a month-by-month updated nowcast of quarterly GDP through time.</br></br>
We will also copy those nowcasts over to the quarterly page, taking the average of the three months forecast for each quarter.</br></br>
<h3 class="subseccol">Results</h3>
The graph below shows a time-series of the monthly nowcasts generated by the FA-MIDAS model, the AR(1) model, alongside actual GDP data as-of first release:</br></br>
<!-- :::::::::: FIGURE 16 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image16.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image16.jpg" title="Nowcast GDP Comparison (Monthly)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 16: Nowcast GDP Comparison (Monthly)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 16 :::::::::: -->
We can see that the MIDAS nowcast has a large degree of fluctuation during the COVID period, which is undoubtedly due to the instabilities in the economy at the time. In comparison to the AR(1) model, though, it does correctly time the sharp decrease in GDP at the start of 2020, even if it does overshoot dramatically.</br></br>
Looking at the quarterly version of the same graph, we can see that the MIDAS approach matches actual GDP very closely for the first year of COVID, but again fluctuates a little too much.</br></br>
<!-- :::::::::: FIGURE 17 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image17.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image17.jpg" title="Nowcast GDP Comparison (Quarterly)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 17: Nowcast GDP Comparison (Quarterly)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 17 :::::::::: -->
The script also produces a forecast evaluation table of the two forecasts, and a simple average of the two:</br></br>
<!-- :::::::::: FIGURE 18 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image18.jpg">
<img height="auto" src="http://www.eviews.com/blog/nowcasting_using_fa_midas/images/image18.jpg" title="Nowcast GDP (Forecast Evaluation)" width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 18: Nowcast GDP (Forecast Evaluation)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 18 :::::::::: -->
The average of the two forecasts performs best, but out of the two, the FA-MIDAS model produces the more accurate forecasts.
<hr />
<h3 class="seccol", id="sec5">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/nowcasting_using_fa_midas/workfiles/nowcasting_using_fa_midas.prg"'><b class="wf">NOWCASTING_USING_FA_MIDAS.PRG</b></a></li>
</ul><br /><br />
<hr />
<h3 class="seccol", id="sec6">References</h3>
<ol class="bib2xhtml">
<li id="ferrara_marsilli_2018">
Ferrara, L., & Marsilli, C., (2018). Nowcasting global economic growth: A factor‐augmented mixed‐frequency approach. <cite>The World Economy</cite>.
</li>
<li id="gul_kazdal_2021">
Gül, E., & Kazdal, T., (2021). COVID-19 pandemic, vaccination and household expenditures: regional evidence from Turkish credit card data <cite>Applied Economics Letters</cite>, 1-4.
</li>
<li id="marcellino_schumacher_2010">
Marcellino, M., & Schumacher, C., (2010). Factor MIDAS for nowcasting and forecasting with ragged‐edge data: A model comparison for German GDP <cite>Economics Statistics</cite>, 72.4: 518-550.
</li>
</ol>
</style>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-46544384210116521672023-05-22T10:43:00.001-07:002023-05-22T10:46:01.888-07:00State Space Models with GARCH Errors<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Eren Ocakverdi</i><br /><br />
This blog piece intends to introduce a new add-in (i.e. <b>SSPACEGARCH</b>) that extends the current capability of EViews’ available features for the estimation of univariate state space models.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">A workaround to control for the changing variance problem</a>
<li><a href="#sec3">Application to a CAPM-type specification</a>
<li><a href="#sec4">Code</a>
<li><a href="#sec5">Discretion</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction</h3>
Linear State Space Models (LSSM) assume that the error variance of the measurement/signal equation is constant. In practice, however, there are situations where this may not be the case and the variance of errors is time-varying. Ignoring this fact may bias parameter estimates.<br/><br/><br/><br/>
<h3 class="seccol", id="sec2">A workaround to control for the changing variance problem</h3>
Suppose that we have a full time-varying parameter model:<br/><br/>
\begin{align*}
y_t &= b_{0t} + b_{1t}x_t + e_t\\
b_{it} &= b_{it-1} + \epsilon_{it}, \quad \text{where} \quad \epsilon_{it} \sim IID(0, \theta_i)\\
e_t &= \eta_t \sigma_t\\
\sigma^2_t &= \omega + \alpha_1 e^2_{t - 1} + \beta_1 \sigma^2_{t - 1}, \quad \text{where} \quad \eta_t \sim IID(0,1)
\end{align*}
Although the dynamic system above can be put into state space from, it cannot be solved via default algorithms since the variance equation is not linear in the state variable.
Kalman filter and smoother can be applied iteratively to obtain a new smoothed estimate of state variables, $b_{it}$. New values for the signal estimate $\tilde{e}_t$ are modelled as a GARCH process and are used to compute new values for $\sigma^2_t$ until convergence to $\hat{e}_t$, $\hat{b}_{it}$ or the log-likelihood.</br></br></br></br>
<h3 class="seccol", id="sec3">Application to a CAPM-type specification</h3>
MSCI Emerging Markets Currency Index® is a useful benchmark to understand whether a given EM currency is performing above or below relative to its peers. In this exercise, we will try to identify the relationship between TRY and MSCI (see Figure 1).</br></br>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ssmgarch/images/mscivstry.png"><img height="auto"
src="http://www.eviews.com/blog/ssmgarch/images/mscivstry.png" title="MSCI vs TRY"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: MSCI Emerging Markets Currency Index® vs Indexed and rebased TRY (in USD)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
The divergence of two indices is clear even by visual inspection, but we are interested in how the relationship between returns has changed over time. First, using the time-varying parameter model above we estimate the parameters assuming fixed variance (see Figure 2).</br></br>
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 2a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ssmgarch/images/sm_alpha.png"><img height="auto"
src="http://www.eviews.com/blog/ssmgarch/images/sm_alpha.png" title="Alpha"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 2b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ssmgarch/images/sm_beta.png"><img height="auto"
src="http://www.eviews.com/blog/ssmgarch/images/sm_beta.png" title="Beta"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2a: Smoothed estimates: Alpha</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 2b: Smoothed estimates: Beta</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
The alpha coefficient has significantly diverged from zero to negative territory since the global financial crisis of 2008. Beta coefficient hovered around 1 during 2008 and 2021, but declined afterwards and has become statistically insignificant over the course of 2022. Also note the spike around August 2018, which leads us to suspect idiosyncratic factors/developments.</br></br>
To estimate the parameters along with a changing variance model, we can use the add-in (see Figure 3).</br></br>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ssmgarch/images/gui.png"><img height="auto"
src="http://www.eviews.com/blog/ssmgarch/images/gui.png" title="SSMGARCH GUI"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: SSMGARCH GUI</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
Since we are dealing with financial data in daily frequency, assuming a GARCH(1,1) structure for the errors would be a reasonable choice to approach the changing variance problem. We can then compare the results to see the difference with respect to the outcome of fixed variance model (see Figure 4).</br></br>
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 4a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ssmgarch/images/smvsfixed_alpha.png"><img height="auto"
src="http://www.eviews.com/blog/ssmgarch/images/smvsfixed_alpha.png" title="Alpha"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 4b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ssmgarch/images/smvsfixed_beta.png"><img height="auto"
src="http://www.eviews.com/blog/ssmgarch/images/smvsfixed_beta.png" title="Beta"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4a: Smoothed vs Fixed and Changed Variance Models: Alpha</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 4b: Smoothed vs Fixed and Changed Variance Models: Beta</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
The alpha coefficient becomes constant over the full sample. Although the level of smoothed beta coefficient changes over the full sample vis-à-vis fixed variance model, the dynamics of both estimates remain more or less the same except for certain periods. A closer look at such periods along with the behavior of volatility might shed light on some of the differences we observe (Figure 5).
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ssmgarch/images/diff.png"><img height="auto"
src="http://www.eviews.com/blog/ssmgarch/images/diff.png" title="Difference in estimates"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Difference in estimates vis-à-vis conditional standard deviation</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
Not surprisingly, many of the large discrepancies in parameter estimates overlap with the periods that have experienced jumps in volatility, most of which were due to specific events took place in Turkish financial markets at the time.<br/><br/><br/><br/>
<h3 class="seccol", id="sec4">Code</h3>
<pre>
<code>
<span style="color: #6aa84f;">'create a daily workfile</span>
wfcreate d5 2005 2022
<span style="color: #6aa84f;">'retrieve data from Bloomberg</span>
fetch(d=none) "tryusd curncy" <span style="color: #6aa84f;">'TRY currency in USD</span>
fetch(d=none) "mxef0cx0 index" <span style="color: #6aa84f;">'MSCI Emerging Markets Currency Index in USD</span>
<span style="color: #6aa84f;">'rename msci currency index</span>
rename mxef0cx0 msci_curncy
<span style="color: #6aa84f;">'drop missing values in data</span>
group data.add tryusd msci_curncy
pagecontract @all if @rnas(data)=0
<span style="color: #6aa84f;">'generate an index from tryusd for comparison purposes</span>
smpl @first @first
series try_curncy = msci_curncy
smpl @first+1 @last
try_curncy = try_curncy(-1)*tryusd/tryusd(-1)
smpl @all
<span style="color: #6aa84f;">'draw charts</span>
graph figure1.line try_curncy msci_curncy
<span style="color: #6aa84f;">'build a time-varying parameter CAPM model in state space</span>
sspace ssmodel
ssmodel.append @signal dlog(tryusd)*100 = alpha + beta*dlog(msci_curncy)*100 + [var=exp(c(1))]
ssmodel.append @state alpha = alpha(-1) + [var=exp(c(2))]
ssmodel.append @state beta = beta(-1) + [var=exp(c(3))]
ssmodel.append @param c(1) .0 c(2) .0 c(3) .0
<span style="color: #6aa84f;">'estimate the model</span>
ssmodel.ml
<span style="color: #6aa84f;">'display and save the smoothed estimates of time varying coefficients</span>
freeze(mode=overwrite,figure2) ssmodel.stategraphs(t=smooth) *f
figure2.align(2,1,1)
ssmodel.makestates(t=smooth) sm_*
<span style="color: #6aa84f;">'estimate the model assuming GARCH(1,1) errors</span>
ssmodel.sspacegarch(type=1,ref=1,iters=15,tol=1e-02,adjsave,garchsave)
<span style="color: #6aa84f;">'save the smoothed estimates of time varying coefficients</span>
ssmodel_new.makestates(t=smooth) sm_*_new
<span style="color: #6aa84f;">'compare the smoothed estimates of beta coefficients</span>
group gr_alpha.add sm_alpha*
freeze(mode=overwrite,figure4a) gr_alpha.line
group gr_beta.add sm_beta*
freeze(mode=overwrite,figure4b) gr_beta.line
graph figure4.merge figure4a figure4b
figure4.align(2,1,1)
<span style="color: #6aa84f;">'compare the absolute difference in estimates to GARCH errors</span>
group gr_diff.add @abs((sm_beta-sm_beta_new)) @sqrt(garchvar)
freeze(mode=overwrite,figure5) gr_diff.line
figure5.axis overlap
figure5.setelem(2) axis(r)
</code>
</pre>
<h3 class="seccol", id="sec2">Discretion</h3>
Please note that the method implemented in the add-in is not particularly suggested by any peer-reviewed study to the best of author’s knowledge, as it is not a preferable way of handling the problem from the econometric point of view. This is simply an iterative process that involves repeated estimation of model assuming a proper GARCH structure for the errors to correct for changing variance problem, if any. It can be helpful only for practical purposes, but you should use it with discretion and at your own risk.
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-50661598906334923072022-09-06T08:21:00.019-07:002022-09-22T07:59:23.696-07:00NARDL in EViews 13: A Study of Bosnia's Tourism Sector<style>
/* table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
} */
/* td {
border: 1px solid black;
} */
td.bold {
font-weight: bold;
}
td.btop {border-top: 1px solid black}
td.bbot {border-bottom: 1px solid black}
td.bleft {border-left: 1px solid black}
td.bright {border-right: 1px solid black}
td.center {text-align:center}
td.left {text-align:left}
td.right {text-align:right}
td.bottom {vertical-align:bottom}
td.underline {text-decoration:underline}
td.strikeout {text-decoration:line-through}
td.indent1 {text-indent:1}
hr.width100 {width: 100%}
hr.black {color:#000000}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.wfvar {
font-weight: bold;
text-transform: uppercase;
}
.wf {
font-weight: bold;
text-transform: uppercase;
}
.subseccol {
color: #fa5e5e
}
.bold {
font-weight: 400;
}
.col_blue {
color: rgba(41, 61, 92, 1)
}
.col_red {
color: rgba(250, 94, 94, 1)
}
.col_green {
color: rgba(0, 200, 125, 1)
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["HTML.js", "AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
EViews 13 introduces several new features to extend the analysis of the well-known autoregressive distributed lag (ARDL) model (see our 3-part ARDL blog series: <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjkoJXCnvf5AhW9LEQIHeJICB0QFnoECAoQAQ&url=http%3A%2F%2Fblog.eviews.com%2F2017%2F04%2Fautoregressive-distributed-lag-ardl.html&usg=AOvVaw0pUqhaQUdPEBLLHvQPvub0">Part I</a>, <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjkoJXCnvf5AhW9LEQIHeJICB0QFnoECAkQAQ&url=http%3A%2F%2Fblog.eviews.com%2F2017%2F05%2Fautoregressive-distributed-lag-ardl_8.html&usg=AOvVaw3J-RTPZsUgx5ntYtswH2uO">Part II</a>, and <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjkoJXCnvf5AhW9LEQIHeJICB0QFnoECAgQAQ&url=https%3A%2F%2Fblog.eviews.com%2F2017%2F05%2Fautoregressive-distributed-lag-ardl.html&usg=AOvVaw2J0lcKd6_EPB0ku8ho8dsW">Part III</a>). In particular, estimation of ARDL models now accommodates asymmetric distributed lag (DL) regressors which extend traditional ARDL models to the increasingly popular nonlinear ARDL (NARDL) models. The latter allow for more complex dynamics which focus on modeling asymmetries both among the cointegrating (long-run) as well as the dynamic adjustment (short-run) relationships. To demonstrate these features, we will examine whether tourist arrivals and their length of stay (popular measures of tourism sector development) have asymmetric effects on the overall economic development (measured as gross domestic product (GDP)) in Bosnia and Herzegovina.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Data and Motivation</a>
<li><a href="#sec3">Estimation / Asymmetry </a>
<li><a href="#sec4">Bounds Test / Cointegration</a>
<li><a href="#sec5">Dynamic Multipliers</a>
<li><a href="#sec6">Interpretation and Policy Implications</a>
<li><a href="#sec7">Files</a>
<li><a href="#sec8">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction</h3>
Tourism is a crucial source of revenue for many economies. In case of Bosnia and Herzegovina (BiH), despite having incredible historical and natural touristic appeal, tourism was an inconsequential contributor to GDP growth in the period preceding Bosnia's horrific period of aggression in the early 1990s. Bosnia's economy at that time was highly reliant on natural resource exploitation (particularly metal ores and forestry), hydroelectric production, and manufacturing. While Bosnia managed to resurrect some of these industries following the end of the aggression in 1996, it also struggled to reinvigorate several highly prospective ones, such as industrial production. In their stead, tourism has been evolving into an increasingly more significant contributor to Bosnia's GDP development.<br/><br/>
Much of the expansion in foreign tourist interest in Bosnia stems from its active marketing campaigns and encouragement from the European Union (see the EU's <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjk86S89uf5AhVMEEQIHXU_AgsQFnoECAgQAQ&url=https%3A%2F%2Feuropa.eu%2Fcapacity4dev%2Ffile%2F10364%2Fdownload%3Ftoken%3DZqUxjypX&usg=AOvVaw2672iqgPu4ydGtaNxk8CrU">2007/145-210 Project</a>). Bosnia's capital, Sarajevo, having hosted the 1984 winter olympics, is also home to the widely popular <a href="https://www.sff.ba/en">Sarajevo Film Festival</a>, whereas the UNESCO protected city of Mostar in the south, hosts the <a href="https://www.redbull.com/us-en/event-series/redbull-cliffdiving">Red Bull Cliff Diving World Series</a> and is often on top lists of the most <a href="https://edition.cnn.com/travel/article/beautiful-towns-europe/index.html">most beautiful cities</a> to visit in Europe. Moreover, as a silver lining to its bloody history at the onset of the 90s, the country today boasts a diaspora of 2.2 million individuals (or 55% of its pre-aggression population), who live and work in major economies around the globe. These factors, in combination with the country's relatively low cost of living, makes Bosnia and Herzegovina a particularly appealing destination for tourists hailing from relatively larger economies.<br/><br/>
On the other hand, Bosnia's domestic tourism sector, while lagging behind its foreign counterpart, witnessed a similar revival. As the country benefits from both summer and winter destinations, beach activities in costal cities along the Adriatic Sea and winter skiing activities in the mid-west, both present opportunities for locals to enjoy their homeland year round. Domestic tourism is further stimulated by the nearly 200 local hiking and <a href="https://daily.jstor.org/bosnia-hiking-mountaineering-clubs/">alpine clubs</a> which are also slowly using these platforms to offer foreign tourists multi-day guided tours of the region. Finally, it's worth noting that Bosnia had a significant middle-age domicile generation which remained in the country during the aggression period. Economic opportunities in the recovery years for this demographic segment were greatly stunted and today many of them cannot afford international travel; instead confining themselves to domestic excursions.<br/><br/>
There are, in fact, numerous articles which demonstrate that tourism has a positive effect on economic growth. The general idea driving this body of research is that tourism is a positive externality which stimulates infrastructural development, foreign direct investments, and the fabric of modern internet and mobile connectivity. Our objective here is to study the magnitude and asymmetries in Bosnia's tourism sector may exert on its overall economic development.<br/><br/>
Traditionally, these dynamics can be explored through vector error-correction (VEC) models. Nevertheless, this class of models generally assumes that all system variables are integrated of at least order 1, and does not preclude the possibility of multiple equally possible cointegrating relationships. Alternatively, classical ARDL models (see our blog series <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjkoJXCnvf5AhW9LEQIHeJICB0QFnoECAoQAQ&url=http%3A%2F%2Fblog.eviews.com%2F2017%2F04%2Fautoregressive-distributed-lag-ardl.html&usg=AOvVaw0pUqhaQUdPEBLLHvQPvub0">Part I</a>, <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjkoJXCnvf5AhW9LEQIHeJICB0QFnoECAkQAQ&url=http%3A%2F%2Fblog.eviews.com%2F2017%2F05%2Fautoregressive-distributed-lag-ardl_8.html&usg=AOvVaw3J-RTPZsUgx5ntYtswH2uO">Part II</a>, and <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjkoJXCnvf5AhW9LEQIHeJICB0QFnoECAgQAQ&url=https%3A%2F%2Fblog.eviews.com%2F2017%2F05%2Fautoregressive-distributed-lag-ardl.html&usg=AOvVaw2J0lcKd6_EPB0ku8ho8dsW">Part III</a>) allow for varying degrees of integration among the system variables (provided the maximum order of integration is less than 2), and assume a single unique cointegrating relationship among the variables of interest. Nevertheless, this framework assumes that the long-run (cointegrating) relationships is a symmetric linear combination of regressors.<br/><br/>
While the classical ARDL framework is perfectly reasonable for many applications, it cannot, however, accommodate behavioral finance and economics research on nonlinearity and asymmetry, which are also often encountered in practice; see the seminal contributions by <a href="#kahneman_tversky_1979">Kahneman and Tversky (1979)</a> and <a href="#shiller_2005"></a>Shiller (2005). To address this limitation, <a href="#shin_et_al_2014">Shin, Yu, and Greenwood-Nimmo (2014)</a> propose a nonlinear ARDL (NARDL) framework in which short-run and long-run non-linearities are modeled as positive and negative partial sum decompositions of the distributed lag variables. Recall that any given variable $ z_t $ may be decomposed as $ z_t = z_0 + z_t^{+} + z_t^{-} $ where $ z_0 $ is the initial value and $ z_t^{+} $ and $ z_t^{-} $ are the partial sum processes of positive and negative changes in $ z_t $, respectively:
\begin{align*}
z_t^{+} &= \sum_{s = 1}^{t} \max \left( \Delta z_s, 0 \right)\\
z_t^{-} &= \sum_{s = 1}^{t} \min \left( \Delta z_s, 0 \right)
\end{align*}
Note that when $ z_t $ is a distributed lag variable and $ z_t^{+} \neq z_t^{-} $, the distributed lag variable exhibits asymmetric effects where positive changes have a different impact on the dependent variable than their negative counterparts. On the other hand, when $ z_t^{+} = z_t^{-} $, the distributed lag variable exhibits symmetric effects on the dependent variable and reduces to the classical ARDL effect.<br/><br/>
The NARDL framework also provides asymmetric dynamic multipliers. These constructs, which are similar to impulse-response curves in the VAR literature, trace asymmetric paths of adjustment of each nonlinear distributed lag regressor to its long-run (cointegrating) state.<br/><br/>
Below, we will apply the NARDL framework to identify the long-run (cointegrating) and short-run (adjusting) dynamics which relate Bosnia's tourism sector to its state of the economy.<br/><br/><br/><br/>
<h3 class="seccol", id="sec2">Data and Motivation</h3>
To conduct the analysis, we will collect data directly from the <a href="https://bhas.gov.ba/?lang=en">Agency for Statistics of Bosnia and Herzegovina</a> (ASBH). In particular, we are interested in 5 different time series:
<ul>
<li><b class="wfvar">GDP</b>: gross domestic product</li>
<li><b class="wfvar">FTA</b>: foreign tourist arrivals.</li>
<li><b class="wfvar">DTA</b>: domestic tourist arrivals</li>
<li><b class="wfvar">FTS</b>: foreign tourist length of stay</li>
<li><b class="wfvar">DTS</b>: domestic tourist length of stay</li>
</ul>
Note that ASBH defines a foreign tourist as any "person with permanent residence <b>outside</b> of BiH who temporarily resides in BiH and who spends at least one night in a hotel or [similar] accommodation establishment." Similarly, it defines a domestic tourist as any "person with permanent residence <b>inside</b> BiH who spends at least one night in a hotel or [similar] accommodation establishment outside their place or residence." It also defines tourist arrivals as "the number of persons (tourists) who arrived and registered their stay in an accommodation establishment", and tourist length of stay as the number of "registered overnight stays of a person (tourist) in an accommodation establishment."<br/><br/>
It's also important to mention a few caveats regarding our data. First, Bosnia's GDP, measured using the expenditure approach using previous year prices, is collected quarterly. Furthermore, while GDP data does exists from Q1 2000 to Q4 2021, ASBH has labeled data after 2021 as forecasted and not actual. As a precaution, we've decided to ignore this data altogether and shorten the GDP series to cover the last actual measurements made in Q4 of 2020.<br/><br/>
In contrast, tourism sector variables are collected monthly and date back to January 2008. This clearly presents a challenge in terms of using all variables in the same framework simultaneously. While methods such as <a href="https://blog.eviews.com/2020/12/nowcasting-gdp-with-pmi-using-midas-gets.html">MIDAS</a> do exist to address these issues, we've opted to handle the problem using manual frequency conversion using the official EViews <a href="https://www.eviews.com/Learning/freqconv_a.html">frequency conversion tutorial</a>. In particular, to benefit from the longer time series available among tourism variables, we've decided to convert GDP from its low quarterly frequency, into the higher monthly frequency, using the Denton method. As a convenience, we have made available a pre-processed version of this data as an EViews workfile, which may be downloaded from <a href="http://www.eviews.com/blog/ev13_nardl/workfiles/bih_tourism.wf1">here</a>.<br/><br/>
Before engaging in any advanced analysis, it's also encouraged to understand the data we're dealing with. This will not only give us some idea about the state of Bosnia's tourism sector, but also help us identify meaningful patterns we may try to exploit later.<br/><br/>
While Bosnia's foreign and domestic tourist arrivals have both seen considerable activity, over the reporting period between 2008 and 2022, foreign tourism seems to have undergone a nearly exponential transformation. The stark contrast is illustrated in Figure 1a below. In fact (see Figure 1b), in the period preceding the COVID-19 pandemic years, the average year-on-year percent change of annual tourist arrivals was 13% and 4.12% for foreign and domestic sectors, respectively. In 2015, the year-on-year growth in tourist arrivals hovered around 26.53% among foreigners, and 13.1% among locals.<br/><br/>
<!-- :::::::::: FIGURES 1a and 1b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 1a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_dfta.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_dfta.png?<php echo filemtime( $file ); ?>" title="Domestic vs. Foreign Annual Tourist Arrivals"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 1b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_dfta_yoypc.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_dfta_yoypc.png?<php echo filemtime( $file ); ?>" title="Domestic vs. Foreign YoY Percent Change in Tourist Arrivals"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1a: Domestic vs. Foreign Annual Tourist Arrivals</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 1b: Domestic vs. Foreign YoY Percent Change in Tourist Arrivals</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 1a and 1b :::::::::: -->
Comparing Bosnia's foreign and domestic tourism sectors, the summaries above suggest that foreign tourist arrivals have significantly more clout than their domestic counterparts. We can gain further insight by looking at the monthly time series in Figure 2.<br/><br/>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_dfta_seas.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_dfta_seas.png?<php echo filemtime( $file ); ?>" title="Domestic vs. Foreign Tourist Arrivals Time Series"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Domestic vs. Foreign Tourist Arrivals Time Series</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
What stands out in Figure 2 is that both domestic and foreign tourist arrivals exhibit predictable seasonal effects.They both reach troughs early in the year, and reach crests at the start and end of the tourism peak season, respectively. More importantly the magnitude of arrivals in peak tourism months among foreign tourists simply towers over its domestic equivalent. For further context, Figure 3 and Table 1 illustrate the distribution of Bosnia's annual tourist arrivals from abroad.<br/><br/>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_fta_by_year_cx.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_fta_by_year_cx.png?<php echo filemtime( $file ); ?>" title="Annual Total Foreign Tourist Arrivals"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Annual Total Foreign Tourist Arrivals</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<!-- :::::::::: TABLE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_arrivals_by_year_cx.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_arrivals_by_year_cx.png?<php echo filemtime( $file ); ?>" title="Annual Total Foreign Tourist Arrivals"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Table 1: Annual Total Foreign Tourist Arrivals</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: TABLE 1 :::::::::: -->
As tourism is often tightly tied to discretionary time, for much of the working population, this resource will remain steady across a good number of years. We therefore expect behavioural patterns driving length of visits to Bosnia to be little changed across our reporting period. In particular, in the period preceding the COVID-19 pandemic, the average (median) length of stay was 2.15 (2.15) and 2.13 (2.12) days for domestic and foreign tourists, respectively, whereas standard deviations hovered around 0.32 and 0.21, respectively. This is confirmed in Figure 4a below. In Figure 5b, we seek to understand the seasonality of domestic and foreign tourist stays. As is the case with tourist arrivals, these curves also exhibit seasonality. Nevertheless, there is an interesting pattern which indicates that while domestic tourists stay longer in the summer months as opposed to winter months, the opposite is true for foreign tourists. In fact, in the years before 2020, the mean (median) length of stay in quarter 1 was 1.98 (1.95) and 2.33 (2.35) days for domestic and foreign tourists, respectively. On the other hand, in the same period, the mean (median) length of stay in quarter 3 for domestic and foreign tourists was 2.56 (2.52) and 2.20 (2.19) days, respectively.<br/><br/>
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 4a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_dfts.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_dfts.png?<php echo filemtime( $file ); ?>" title="Domestic vs. Foreign Average Annual Length of Stay per Tourist"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 4b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_dfts_seas.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_dfts_seas.png?<php echo filemtime( $file ); ?>" title="Domestic vs. Foreign Tourist Length of Stay Time Series"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4a: Domestic vs. Foreign Average Annual Length of Stay per Tourist</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 4b: Domestic vs. Foreign Tourist Length of Stay Time Series</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
In case of foreign tourists exclusively, Figure 5 and Table 2 summarize the distribution of annual length of stays per tourist, across various countries.<br/><br/>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_fts_by_year_cx.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_fts_by_year_cx.png?<php echo filemtime( $file ); ?>" title="Average Annual Foreign Tourist Length of Stay"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Average Annual Foreign Tourist Length of Stay</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<!-- :::::::::: TABLE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_average_length_by_year_cx.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_average_length_by_year_cx.png?<php echo filemtime( $file ); ?>" title="Average Annual Foreign Tourist Length of Stay"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Table 2: Average Annual Foreign Tourist Length of Stay</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: TABLE 2 :::::::::: -->
Next, let's glance at Bosnia's GDP. In Figure 6a below, we can identify trending growth with seasonal effects which closely mimic those of foreign tourist arrivals; namely, GDP ebbs in January and peaks in July. For further insight, Figure 6b plots the standardized monthly GDP and both domestic and foreign tourist arrivals. What this plot illustrates is that tourist arrivals fluctuate about their mean values significantly more than GDP. Furthermore, whereas domestic tourism seems to dominate these fluctuations at the start of the reported sample, deviations in foreign tourism seem to dominate the end of the reported sample. More importantly, we see a strong positive correlation between GDP and foreign tourist arrivals.<br/><br/>
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 6a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_gdp.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_gdp.png?<php echo filemtime( $file ); ?>" title="Monthly GDP"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 6b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_dfta_gdp_stdize.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_dfta_gdp_stdize.png?<php echo filemtime( $file ); ?>" title="Standardized Monthly GDP and Tourist Arrivals"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6a: Monthly GDP</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 6b: Standardized Monthly GDP and Tourist Arrivals</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
Before proceeding with estimation, it's also prudent to study the orders of integration of our series. As mentioned in the <a href="#sec1">Introduction</a>, ARDL estimation is not valid in the presence of I(2) variables, but does accommodate a mixture of I(0) and I(1) variables. To identify integration orders, it's easiest to create an EViews group with all the variables and perform a unit root test by clicking on <b>View/Unit Root Tests/Cross-Sectionally Independent</b>. The first test will perform the Im, Pesaran, and Shin (IPS) test with a constant and trend on series in levels. As the null hypothesis of this test is a unit root, tests which have $p$-values near zero will reject the null and identify the series as I(0). Otherwise tests with $p$-values significantly far away from zero will fail to reject the null and identify the series as I(1). By extension, I(2) series can be identified by repeating the tests on the first difference of the series. Figures 7a and 7b summarize these tests.<br/><br/>
<!-- :::::::::: FIGURES 7a and 7b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 7a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/ur_test1.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/ur_test1.png?<php echo filemtime( $file ); ?>" title="Unit Root Tests in Levels"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 7b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/ur_test2.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/ur_test2.png?<php echo filemtime( $file ); ?>" title="Unit Root Tests in First Differences"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7a: Unit Root Tests in Levels</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 7b: Unit Root Tests in First Differences</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
The unit root tests in levels identify domestic tourist length of stays and the GDP as integrated of order 1, whereas the remaining variables are identified as integrated of order 0. On the other hand, every null hypothesis for the unit root tests in first differences is rejected, and therefore no variable is identified as I(2).<br/><br/><br/><br/>
<h3 class="seccol", id="sec3">Estimation / Asymmetry</h3>
Our objective in this section is to estimate a NARDL model linking the relationship of Bosnia's tourist variables <span class="wfvar">DTA, DTS, FTA, FTS</span> to its sate of the economy, measured as the variable <span class="wfvar">GDP</span>. Formally, the NARDL model - expressed below in its conditional error correction (CEC) form (see the <a href="https://www.eviews.com/help/helpintro.html#page/content%2Fardl-Background.html%23">EViews manual</a> for details) - we are interested in studying is:
\begin{align*}
\class{bold} {
\Delta \ln(GDP)
}
& \class{bold} {
=
}
\class{bold col_red}{
\phi_{\scriptsize \text{GDP}} \ln(\text{GDP})_{t - 1}
}\\
&\class{bold col_red}{
+ \phi_{\scriptsize{\text{DTA}}}^{+} \ln(\text{DTA})_{t - 1}^{+} + \phi_{\scriptsize{\text{DTA}}}^{-} \ln(\text{DTA})_{t - 1}^{-}
+ \phi_{\scriptsize{\text{FTA}}}^{+} \ln(\text{FTA})_{t - 1}^{+} + \phi_{\scriptsize{\text{FTA}}}^{-} \ln(\text{FTA})_{t - 1}^{-}
}\\
&\class{bold col_red}{
+ \phi_{\scriptsize{\text{DTS}}}^{+} \text{DTS}_{t - 1}^{+} + \phi_{\scriptsize{\text{DTS}}}^{-} \text{DTS}_{t - 1}^{-}
+ \phi_{\scriptsize{\text{FTS}}}^{+} \text{FTS}_{t - 1}^{+} + \phi_{\scriptsize{\text{FTS}}}^{-} \text{FTS}_{t - 1}^{-}
}\\
&\class{bold col_blue}{
+ \sum_{j = 1}^{p - 1} \gamma_{\scriptsize{\text{GDP}} \normalsize{, \, j}} \Delta \ln(\text{GDP})_{t - j}
}\\
&\class{bold col_blue}{
+ \sum_{k_1 = 1}^{q_1 - 1} \left(
\gamma_{\scriptsize{\text{DTA}} \normalsize{, \, k_1}}^{+} \Delta \ln(\text{DTA})_{t - k_1}^{+}
+ \gamma_{\scriptsize{\text{DTA}} \normalsize{, \, k_1}}^{-} \Delta \ln(\text{DTA})_{t - k_1}^{-}
\right)
+ \sum_{k_2 = 1}^{q_2 - 1} \left(
\gamma_{\scriptsize{\text{FTA}} \normalsize{, \, k_2}}^{+} \Delta \ln(\text{FTA})_{t - k_2}^{+}
+ \gamma_{\scriptsize{\text{FTA}} \normalsize{, \, k_2}}^{-} \Delta \ln(\text{FTA})_{t - k_2}^{-}
\right)
}\\
&\class{bold col_blue}{
+ \sum_{k_3 = 1}^{q_3 - 1} \left(
\gamma_{\scriptsize{\text{DTS}} \normalsize{, \, k_3}}^{+} \Delta \ln(\text{DTS})_{t - k_3}^{+}
+ \gamma_{\scriptsize{\text{DTS}} \normalsize{, \, k_3}}^{-} \Delta \ln(\text{DTS})_{t - k_3}^{-}
\right)
+ \sum_{k_4 = 1}^{q_4 - 1} \left(
\gamma_{\scriptsize{\text{FTS}} \normalsize{, \, k_4}}^{+} \Delta \ln(\text{FTS})_{t - k_4}^{+}
+ \gamma_{\scriptsize{\text{FTS}} \normalsize{, \, k_4}}^{-} \Delta \ln(\text{FTS})_{t - k_4}^{-}
\right)
}\\
&\class{bold col_green}{
+ \alpha_0 + \alpha_1 t + \sum_{i = 1}^{11} \delta_{i} m_i + \epsilon_t
}
\end{align*}
This describes a NARDL$ (p, q_1, q_2, q_3, q_4) $ model where <span class="wfvar">GDP</span> enters as an autoregressive process of order $p$, and <span class="wfvar">DTA, FTA, DTS, FTS</span> enter as asymmetric distributed lag variables with orders $ q_1, q_2, q_3, q_4 $, respectively. For easier identification, we have also coloured some portions of the CEC relation as red, blue, and green. The latter characterize the cointegrating (levels or long-run), the adjusting (differences or short-run), and the deterministic (seasonality) dynamics, respectively. Furthermore, variables with <b>+</b> and <b>-</b> superscripts denote, respectively, the positive and negative partial sum decompositions of the underlying distributed lag variable. The positive and negative partial sums here explicitly model how asymmetries in Bosnia's tourism sector, both in the long- and short- run, reflect on its economic development.<br/><br/>
The deterministic dynamics deserve a brief comment as well. In particular, $ \alpha_0 $ and $ \alpha_1 $ respectively capture the effect of the constant and linear trend. To also capture monthly seasonality, the coefficients $ \delta_{i} $ are associated with seasonal dummy variables $ m_i $ for month $ i $.<br/><br/>
We will start the analysis by estimating the model above, treating all tourism variables as asymmetric in both the adjusting and cointegrating dynamics. To identify the autoregressive and distributed lag orders $ p, q_1, q_2, q_3, q_4 $, we will perform automatic lag selection, allowing at most 3 lags for the dependent variable and each of the regressors (the default options). This effectively states that all variables depend on values at most 3 periods (months) in the past; in other words, a single quarter.<br/><br/>
The variables we need are located in the workfile page <span class="wf">tourism_monthly</span>. To avoid the complications of the pandemic years, we will only use data in the years prior to 2020. We can do so in the <b>Command</b> window by typing
<b><pre>smpl if @year<2020</pre></b>
Next, bring up the NARDL dialog (see Figure 8a) and enter the specifications as follows:
<ol>
<li>From the main EViews menu, click on <b>Quick/Estimate Equation...</b></li>
<li>Change the Method dropdown to <b>ARDL - Auto-regressive Distributed Lag Models (including NARDL)</b></li>
<li>Under <b>Linear dynamic specification</b> specify <b><span class="wfvar">@log(gdp)</span></b></li>
<li>Under <b>Long and short-run asymmetry</b> specify <b><span class="wfvar">@log(dta) @log(fta) dts fts</span></b></li>
<li>Under <b>Fixed regressors</b> specify <b><span class="wfvar">@expand(@month, @droplast)</span></b></li>
<li>Set the <b>Trend specification</b> to <b>Constant</b></li>
<li>Set both of the <b>Max. lags</b> dropdowns to <b>3</b></li>
<li>Click on <b>OK</b></li>
</ol>
<!-- :::::::::: FIGURES 8a and 8b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 8a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/fig_est1_dialog.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/fig_est1_dialog.png?<php echo filemtime( $file ); ?>" title="Full Asymmetry NARDL Dialog"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 8b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_est1_output.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_est1_output.png?<php echo filemtime( $file ); ?>" title="Full Asymmetry NARDL(2,1,3,1,0) Output"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8a: Full Asymmetry NARDL Dialog</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 8b: Full Asymmetry NARDL(2,1,3,1,0) Output</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 8a and 8b :::::::::: -->
Figure 8b summarizes the estimation output. The table header lists a number of important estimation parameters, the most important of which is the set of optimally selected lag order;namely $ 2, 1, 3, 1, 0 $. In other words, $ \class{wfvar} { \text{@log(gdp)} } $ enters the optimal model with lag 2; $ \class{wfvar} { \text{@log(dta)}^{+} } $ and $ \class{wfvar} { \text{@log(dta)}^{-} } $ each enter with lag 1; $ \class{wfvar} { \text{@log(fta)}^{+} } $ and $ \class{wfvar} { \text{@log(fta)}^{-} } $ each enter with lag 3; $ \class{wfvar} { \text{dts}^{+} } $ and $ \class{wfvar} { \text{dts}^{-} } $ each enter with lag 1; and $ \class{wfvar} { \text{fts}^{+} } $ and $ \class{wfvar} { \text{fts}^{-} } $ each enter with lag 0. Recall that the optimal lag order is selected by identifying the model (among the $ 768 = 3 \times (3 + 1)^4 $ estimated) which achieves the optimal information criterion - in this case the minimal value of the Akaike Information Criterion (AIC). We can also visualize (see Figure 9) the lag selection criteria by clicking on <b>View/Model Selection Summary/Criteria Table</b>.<br/><br/>
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est1_lagsel.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est1_lagsel.png?<php echo filemtime( $file ); ?>" title="Full Asymmetry NARDL(2,1,3,1,0) Lag Selection Criteria"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9: Full Asymmetry NARDL(2,1,3,1,0) Lag Selection Criteria</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
Returning to the main estimation table, just below the header, the first 9 coefficients characterize the cointegrating (long-run) dynamics (the coefficients in red); the next 11 coefficients characterize the adjusting (short-run) dynamics (the coefficients in blue); the remaining coefficients characterize the linear trend and seasonal dynamics (the coefficients in green). The table footer rounds off the output with a number of estimation summary statistics.<br/><br/>
Following estimation, the aim of the first inferential exercise is to formally validate assumptions on asymmetry. Although we have written the model above to reflect that all distributed lag variables are asymmetric both among adjusting and cointegrating dynamics, the NARDL model is flexible enough to accommodate partially asymmetry. This manifests when variables enter asymmetrically either the adjusting or cointegrating dynamics, but not both. For instance, consider an arbitrary variable $ z_t $ with asymmetric decompositions $ z_t^{-} $ and $ z_t^{+} $ and associated asymmetric level coefficients $ \phi^{-} $ and $ \phi^{+} $, and associated asymmetric difference coefficients $ \gamma_k^{-} $ and $ \gamma_k^{+} $, for $ k = 1, \ldots, q $. Partial asymmetry in this framework manifests by imposing the restrictions below:
\begin{align*}
\text{Partial Short-run asymmetry (Long-run Symmetry):}& \quad \phi = \phi^{-} = \phi^{+} \\
\text{Partial Long-run asymmetry (Short-run Symmetry):}& \quad \gamma_k = \gamma_k^{-} = \gamma_k^{+} \\
\end{align*}
As NARDL models are typically estimated using least-squares, (partial) asymmetry can be formally tested. These tests reduce to the usual Wald-like hypotheses on the equivalence of positive and negative asymmetry coefficients. Formally,
\begin{align*}
\text{Long-run symmetry only } H_0 &:\quad \phi^{-} = \phi^{+} \\
\text{Short-run symmetry only } H_0 &:\quad
\begin{cases}
\gamma_k^{-} = \gamma_k^{+} \text{ for each } k \\
\\
\text{or}\\
\\
\sum_{k = 1}^q \gamma_k^{-} = \sum_{k = 1}^q \gamma_k^{+}
\end{cases} \\
\text{Joint Short- and Long- run symmetry } H_0 &:\quad
\begin{cases}
\gamma_k^{-} = \gamma_k^{+} \text{ for each } k \text{ and } \phi^{-} = \phi^{+} \\
\\
\text{or}\\
\\
\sum_{k = 1}^q \gamma_k^{-} = \sum_{k = 1}^q \gamma_k^{+} \text{ and } \phi^{-} = \phi^{+}
\end{cases}
\end{align*}
In EViews, these tests are performed after estimating a NARDL model by clicking on <b>View/ARDL Diagnostics/Symmetry Test</b>. Figure 10 below summarizes the output for the regression above.<br/><br/>
<!-- :::::::::: FIGURE 10 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_est1_symmtest.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_est1_symmtest.png?<php echo filemtime( $file ); ?>" title="Full Asymmetry NARDL(2,1,3,1,0) NARDL Symmetry Test"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 10: Full Asymmetry NARDL(2,1,3,1,0) NARDL Symmetry Test</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 10 :::::::::: -->
The output header summarizes the null hypothesis and degrees of freedom, followed by simple tests for long- and short- run symmetry, respectively; followed by the joint test for full symmetry. As short-run and full symmetry can be tested in one of two ways, EViews reports the second test in both cases. Note also that if a variable enters the model with zero-lag, the short-run symmetry test, and by extension the joint test, are not applicable.<br/><br/>
Turning to specific insights, we reject the null hypothesis of long-run symmetry for <span class="wfvar">@LOG(FTA)</span> at all reasonable significance levels and for <span class="wfvar">FTS</span> at the 5% significance level. While we also reject the joint symmetry test for <span class="wfvar">@LOG(FTA)</span> at all reasonable significance levels, we cannot evaluate the joint test for <span class="wfvar">FTS</span> as it enters the current model with zero lags. We also fail to reject the null hypothesis for the remaining coefficients. This suggests that the model we ought to consider next assumes the form:<br/><br/>
\begin{align*}
\class{bold} {
\Delta \ln(GDP)
}
& \class{bold} {
=
}
\class{bold col_red}{
\phi_{\scriptsize \text{GDP}} \ln(\text{GDP})_{t - 1} + \phi_{\scriptsize \text{DTA}} \ln(\text{DTA})_{t - 1} + \phi_{\scriptsize \text{DTS}} \ln(\text{DTS})_{t - 1}
}\\
&\class{bold col_red}{
+ \phi_{\scriptsize{\text{FTA}}}^{+} \ln(\text{FTA})_{t - 1}^{+} + \phi_{\scriptsize{\text{FTA}}}^{-} \ln(\text{FTA})_{t - 1}^{-}
}\\
&\class{bold col_red}{
+ \phi_{\scriptsize{\text{FTS}}}^{+} \text{FTS}_{t - 1}^{+} + \phi_{\scriptsize{\text{FTS}}}^{-} \text{FTS}_{t - 1}^{-}
}\\
&\class{bold col_blue}{
+ \sum_{j = 1}^{p - 1} \gamma_{\scriptsize{\text{GDP}} \normalsize{, \, j}} \Delta \ln(\text{GDP})_{t - j}
}\\
&\class{bold col_blue}{
+ \sum_{k_1 = 1}^{q_1 - 1} \gamma_{\scriptsize{\text{DTA}} \normalsize{, \, k_1}} \Delta \ln(\text{DTA})_{t - k_1}
+ \sum_{k_2 = 1}^{q_2 - 1} \gamma_{\scriptsize{\text{FTA}} \normalsize{, \, k_2}} \Delta \ln(\text{FTA})_{t - k_2}
}\\
&\class{bold col_blue}{
+ \sum_{k_3 = 1}^{q_3 - 1} \gamma_{\scriptsize{\text{DTS}} \normalsize{, \, k_3}} \Delta \ln(\text{DTS})_{t - k_3}
+ \sum_{k_4 = 1}^{q_4 - 1} \gamma_{\scriptsize{\text{FTS}} \normalsize{, \, k_4}} \Delta \ln(\text{FTS})_{t - k_4}
}\\
&\class{bold col_green}{
+ \alpha_0 + \alpha_1 t + \sum_{i = 1}^{11} \delta_{i} m_i + \epsilon_t
}
\end{align*}
We estimate this model next; see Figures 11a and 11b.<br/><br/>
<!-- :::::::::: FIGURES 11a and 11b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 11a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/fig_est2_dialog.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/fig_est2_dialog.png?<php echo filemtime( $file ); ?>" title="Partial Asymmetry ARDL Dialog"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 11b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_output.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_output.png?<php echo filemtime( $file ); ?>" title="Partial Asymmetry NARDL(2,1,2,3,0) Output"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 11a: Partial Asymmetry NARDL Dialog</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 11b: Partial Asymmetry NARDL(2,1,2,3,0) Output</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 11a and 11b :::::::::: -->
As in the first regression, our first inferential exercise here is to confirm the asymmetry assumptions made earlier. The results, summarized in Figure 12, reinforce our understanding that both <span class="wfvar">@LOG(FTA)</span> and <span class="wfvar">FTS</span> are partially asymmetric in the long-run, although the conclusion for <span class="wfvar">FTS</span> holds at all significance levels roughly greater than 7.2%.<br/><br/>
<!-- :::::::::: FIGURE 12 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_symmtest.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_symmtest.png?<php echo filemtime( $file ); ?>" title="Partial Asymmetry NARDL(2,1,2,3,0) Symmetry Test"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 12: Partial Asymmetry NARDL(2,1,2,3,0) Symmetry Test</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 12 :::::::::: -->
Returning to estimation results, a comparison of summary statistics from the first full asymmetry regression with the partial asymmetry regression above, indicates that the R-squared and adjusted R-squared statistics in the latter model are slightly worse, whereas information criteria in the latter model are slightly bigger. While these summary statistics suggest a slight downgrade in model preference, the difference is so slight it can be safely ignored. We therefore continue the remaining inreference exercises using the partial asymmetry NARDL(2,1,2,3,0) model.<br/><br/><br/><br/>
<h3 class="seccol", id="sec4">Bounds Test / Cointegration</h3>
Now that we have settled on a model, we'll test for cointegration among the system variables using the famous bounds test. We proceed by clicking on <b>View/ARDL Diagnostics/Bounds Test</b>; the results are summarized in Figure 13.<br/><br/>
<!-- :::::::::: FIGURE 13 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_boundstest.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_boundstest.png?<php echo filemtime( $file ); ?>" title="Partial Asymmetry NARDL(2,1,2,3,0) Bounds Test"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 13: Partial Asymmetry NARDL(2,1,2,3,0) Bounds Test</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 13 :::::::::: -->
The output is a spool object with the first table summarizing test statistics and the second summarizing the critical values. For a practical review of bounds testing, please refer to <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjkoJXCnvf5AhW9LEQIHeJICB0QFnoECAgQAQ&url=https%3A%2F%2Fblog.eviews.com%2F2017%2F05%2Fautoregressive-distributed-lag-ardl.html&usg=AOvVaw2J0lcKd6_EPB0ku8ho8dsW">Part III</a> of our ARDL blog. In the current framework, the F-bounds test statistic is 11.60, well beyond the I(1) critical value bound, and a clear rejection of the null hypothesis of no cointegration, when all variables are I(1). Recall that the rejection of the bounds test null hypothesis leads to 3 possible alternative hypotheses, only one of which confirms the existence of a useful cointegrating relationship. To ascertain which of the three alternatives emerge, an additional t-bounds test on parameter significance of the lagged dependent variable, namely $ \phi_{\scriptsize \text{GDP}} $ must be performed. In this case, the t-bounds statistic is -8.40, also well below the I(1) critical value bound, and a clear rejection of the null hypothesis that no cointegrating relationship exists when all variables are I(1). As outlined in <a href="#pesaran_et_al_2001">Pesaran et al. (2001)</a>, the rejection of the t-bounds test in the secondary stage confirms the existence of a cointegrating relationship, but does not preclude that that it is degenerate. To rule out degenerate cointegration, a joint test of parameter significance on all coefficients associated with distributed lag variables in levels, ought to be inspected. This can be done with a simple Wald-test (see Figures 14a and 14b) by clicking on
<b>View/Coefficient Diagnostics/Wald Test - Coefficient Restrictions...</b>
and entering
<b><pre>C(2)=0, C(3)=0, C(4)=0, C(5)=0, C(6)=0, C(7)=0</pre></b>
<!-- :::::::::: FIGURES 14a and 14b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 14a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/fig_est2_wald_dialog.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/fig_est2_wald_dialog.png?<php echo filemtime( $file ); ?>" title="Wald Test Dialog"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 14b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_waldtest.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_waldtest.png?<php echo filemtime( $file ); ?>" title="Wald Test Output"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 14a: Wald Test Dialog</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 14b: Wald Test Output</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 14a and 14b :::::::::: -->
With a p-value of 0.00, this test rejects the null hypothesis that all tested coefficients are jointly zero, and by extension, confirms that the cointegrating relationship which emerges, is in fact sensible and <b>not</b> degenerate.<br/><br/>
Given the existence of a non-degenerate cointegrating relationship, we can identify the normalized, long-run coefficients in the cointegrating space which are associated with each of the distributed lag variables. Recall that if $ \phi_{\scriptsize \text{DEP}} $ and $ \phi_{\scriptsize \text{k}} $ are the coefficients associated with the dependent variable $ y_t $ and the k$^{\text{th}}$ distributed-lag variable $ x_{k, t} $ in levels in the (N)ARDL CEC form, respectively, the normalized, long-run distributed lag coefficient in the cointegration space is defined as
\begin{align*}
\beta_{k} \equiv - \frac{\phi_k}{\phi_{DEP}}
\end{align*}
In other words, for a NARDL model with $ K $ distributed lag variables, the cointegrating relationship is formalized as:
\begin{align*}
\class{bold}{
\text{CE} = \ln(\text{GDP})_{t - 1} - \sum_{r = 1}^{K} \beta_{r} \, x_{r, t - 1}
}
\end{align*}
We can estimate these values (see Figure 15) for our concrete model above by clicking on <b>View/ARDL Diagnostics/Cointegrating Relation</b>.<br/><br/>
<!-- :::::::::: FIGURE 15 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_cointrel.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_cointrel.png?<php echo filemtime( $file ); ?>" title="Partial Asymmetry NARDL(2,1,2,3,0) Cointegrating Relation"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 15: Partial Asymmetry NARDL(2,1,2,3,0) Cointegrating Relation</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 15 :::::::::: -->
The output is a spool object with two tables and a graph. The first table provides the cointegrating specification; the second table provides derived long-run (cointegrating) coefficients for each distributed-lag regressors; the graph plots the cointegrating specification as a series. We can also run the error-correction regression in which the long-run variables in the model are replaced by the cointegrating relation series defined in the first table of Figure 15. To do this, click on <b>View/ARDL Diagnostics/Error Correction Results</b> and look at the second table of the spool object which is produced. The latter is reproduced in Figure 16 below.<br/><br/>
<!-- :::::::::: FIGURE 16 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_ecresults_ecreg.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/tab_est2_ecresults_ecreg.png?<php echo filemtime( $file ); ?>" title="Partial Asymmetry NARDL(2,1,2,3,0) Error Correction Regression"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 16: Partial Asymmetry NARDL(2,1,2,3,0) Error Correction Regression</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 16 :::::::::: -->
<br/><br/><br/>
<h3 class="seccol", id="sec5">Dynamic Multipliers</h3>
An important exercise in classical regression analysis is estimating the <b>causal effect</b> or <b>multiplier</b> of a regressor on the dependent variable, <i>ceteris paribus</i>; recall also that this is just the partial derivative of the dependent variable with respect to (wrt) a regressor. While (N)ARDL models can be cast into a classical regression framework, they are dynamic in that lagged values of both the dependent and distributed-lag regressors affect the current state of the dependent variable. Accordingly, (N)ARDL models lend themselves to the derivation of <b>dynamic causal effects</b> or <b>dynamic multipliers</b> - causal effects which can be traced over time.<br/><br/>
In practice, dynamic causal effects can be thought of as analogous to impulse response curves in classical VAR / VEC models. They can be derived as response curves to unitary positive shocks in distributed-lag variables. In particular, for any distributed-lag regressor $ x_{t} $, holding all other regressors unchanged, a single positive unitary shock is introduced at $ T −h $, where $ T $ is the length of the estimation sample, and the evolution of the dependent variable, $ y_t $, is measured through the period $ [T − h, T] $ where $ h \geq 0 $ is some horizon length. This can also be derived as the difference $ \tilde{y}_t - \hat{y}_t $, where $ \tilde{y}_t $ is the in-sample dynamic forecasts of $ y_t $ when $ x_t $ is perturbed at $ t = T - h $ to equal $ x_{T - h} + 1 $, and $ \hat{y}_t $ is the in-sample dynamic forecasts of $ y_t $ when $ x_t $ is left unchanged.<br/><br/>
A natural extension of the dynamic multiplier is the <b>cumulative dynamic multiplier</b> (CDM)- a cumulative sum of dynamic multipliers at each point in time on the interval $ [T - h, T] $. In fact, as $ h \rightarrow \infty $, the cumulative dynamic multiplier converges to the long-run (cointegrating) coefficients discussed in the previous section. In other words, we can trace out the adjustment patterns (the short-run dynamics) as they evolve to converge to their cointegrating (long-run) equilibrium state. See <a href="#shin_et_al_2014">Shin, Yu, and Greenwood-Nimmo (2014)</a> for details.<br/><br/>
Cumulative dynamic multipliers are particularly interesting for asymmetric distributed lag-variables, such as those characterizing NARDL models. They allow researchers to study the evolution of adjustment patterns following negative and positive shocks to asymmetric regressors and quantify the path of asymmetry as CDMs evolve towards their respective (cointegrating) equilibrium states. Furthermore, confidence intervals for the evolution of asymmetry can also be derived via non-parametric bootstrapping.<br/><br/>
To derive CDMs for the model we estimated earlier with a 95% confidence interval derived over 999 bootstrap replications, we can proceed as follows:
<ol>
<li>From the estimated equation object, click on <b>View/ARDL Diagnostics/Dynamic Multiplier Graph...</b></li>
<li>Change the <b>Horizon</b> to <b>50</b></li>
<li>Set the evolution type to <b>Shock</b> or <b>Dynamic multiplier</b></li>
<li>Leave the rest at their default values and click on <b>OK</b></li>
</ol>
There are several things to unpack before analyzing the output. First, note that options for confidence intervals is only available for NARDL models with asymmetric regressors as the asymmetry path for symmetric variables will by construction always be zero.<br/><br/>
Next, note that there are two options for the evolution type: 1) Shock and 2) Dynamic multiplier. As noted in the <a href="https://www.eviews.com/help/helpintro.html#page/content%2Fardl-Views_and_Procs_of_ARDL.html%23ww288251">EViews manual</a>, this distinction is only relevant for NARDL models with asymmetric regressors, and only affects asymmetric negative response curves. In particular, both shock evolution and dynamic multiplier evolution plot the response to a one unit positive change in the symmetric and positive asymmetric cumulated differences. However, unlike the shock evolution framework which plots the response to a one unit negative change in cumulated differences in the negative asymmetric case, the dynamic multiplier evolution framework plots an “improvement” producing a one unit positive increase (reduction of one unit of negative change) in the negative cumulative differences. In fact, the shock evolution plot can be derived from the dynamic multiplier evolution plot by reflecting the negative response curve in the dynamic multiplier evolution plot along the x-axis.<br/><br/>
While the dynamic multiplier evolution framework is reasonable from a technical perspective, it is not ideal if we wish to study the properties of the models under parallel unit increases in the absolute amount of positive and negative asymmetry, as in when determining whether an increase in positive asymmetry has the same effect as an increase in negative asymmetry. Furthermore, as the dynamic multiplier framework is better aligned to a technical analysis, in contrast to the shock evolution framework, it will also plot the long-run coefficient values to which the CDMs converge as the horizon lengths approaches infinity.<br/><br/>
We will start with the <b>Dynamic multiplier</b> evolution and display the output in Figures 17a through 17d.<br/><br/>
<!-- :::::::::: FIGURES 17a, 17b, 17c, 17d :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 17a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_dm1.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_dm1.png?<php echo filemtime( $file ); ?>" title="CDM - @LOG(DTA)"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 17b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_dm2.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_dm2.png?<php echo filemtime( $file ); ?>" title="CDM - DTS"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 17a: CDM - @LOG(DTA)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 17b: CDM - DTS</small>
</center>
</td>
</tr>
<tr>
<td>
<!-- :::::::::: FIGURE 17c :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_dm3.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_dm3.png?<php echo filemtime( $file ); ?>" title="CDM - @LOG(FTA)"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 17d :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_dm4.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_dm4.png?<php echo filemtime( $file ); ?>" title="CDM - FTS"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 17c: CDM - @LOG(FTA)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 17d: CDM - FTS</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 17a, 17b, 17c, 17d :::::::::: -->
First, notice that every plot, in addition to the response curves, also displays the long-run values to which the evolution converges in equilibrium. These dotted lines correspond to the long-run coefficient values outlined in the second table of Figure 15. Furthermore, notice that Figures 17a and 17b display the CDM for symmetric regressors. As such, they have a single response without confidence interval computations. On the other hand, Figures 17c and 17d display the response curves for the positive and negative changes, a response curve for the asymmetry between the two, as well as a confidence interval band around the asymmetry response. In particular, as the zero line is not located between the lower and upper bands in either Figures 17c or 17d, the asymmetric effects of those variables is significant at the 5% level.<br/><br/>
Let's also plot the same information in terms of a shock evolution framework.<br/><br/>
<!-- :::::::::: FIGURES 18a, 18b, 18c, 18d :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 18a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_shock1.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_shock1.png?<php echo filemtime( $file ); ?>" title="CDM - @LOG(DTA)"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 18b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_shock2.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_shock2.png?<php echo filemtime( $file ); ?>" title="CDM - DTS"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 18a: CDM - @LOG(DTA)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 18b: CDM - DTS</small>
</center>
</td>
</tr>
<tr>
<td>
<!-- :::::::::: FIGURE 18c :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_shock3.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_shock3.png?<php echo filemtime( $file ); ?>" title="CDM - @LOG(FTA)"
width="360"/>
</a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 18d :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_shock4.png"><img height="auto"
src="http://www.eviews.com/blog/ev13_nardl/images/plot_est2_dynmult_shock4.png?<php echo filemtime( $file ); ?>" title="CDM - FTS"
width="360"/>
</a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 18c: CDM - @LOG(FTA)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 18d: CDM - FTS</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 18a, 18b, 18c, 18d :::::::::: -->
The curves in Figures 18 present similar information. In fact, Figures 17 and Figures 18 are effectively identical apart from the negative response curve which is reflected along the x-axis in Figures 18.<br/><br/><br/><br/>
<h3 class="seccol", id="sec6">Interpretation and Policy Implications</h3>
We devote this section to a brief interpretation of the results above and suggestions for policy implementations. First, recall that the results of the bounds test summarized in Figure 13 and 14b confirm the existence of a cointegrating relationship between Bosnia's tourist variables and its gross domestic product. In fact, this cointegrating relationship is itself significant as the <span class="wfvar">COINTEQ</span> coefficient in the error-correction regression in Figure 16 is highly significant. On the other hand, only certain components of the cointegrating relationship are significant themselves. Note that the second table in Figure 15 indicates that while domestic tourist arrivals and foreign tourist stays are insignificant in the long-run, domestic tourist length of stay and foreign tourist arrivals are indeed significant in the equilibrium.<br/><br/>
We can gain further insight into how each tourist variables contributes the evolution of Bosnia's GDP by looking at their response curves plotted in Figures 17 and 18. In particular, Figures 17a and 18a illustrate that a 1% positive shock to domestic tourist arrivals increases GDP by 0.04% in roughly two years (25 months), with a considerable short-run boost in the first 6 months.<br/><br/>
Similarly, as shown in Figures 17b and 18b, a single unit positive shock to domestic tourist length of stay (in other words, a single day prolonged stay) lifts GDP by 0.10% in roughly 2 years, with another significant short-run boost in the first 5 months.<br/><br/>
On the other hand, Figures 17c and 18c indicate that a 1% increase to foreign tourist arrivals produces a 0.17% increase in GDP in roughly 25 months. This is in contrast to a 1% decrease (see Figure 18c instead of 17c) in foreign tourist arrivals which will decrease GDP by 0.12% in again, 25 months. As shown by the asymmetry curve and its associated 95% confidence interval, this asymmetry is significant.<br/><br/>
Lastly, Figures 17d and 18d show that a 1 unit positive shock to foreign tourist length of stay produces a 0.02% increase in GDP in approximately two years. Interestingly, as shown in Figure 18d, a 1 unit negative shock to foreign tourist length of stay will also increase GDP by 0.01% in roughly 25 months. This asymmetry is also significant.<br/><br/>
The analysis above paints a general picture of Bosnia's tourism sector and how it impacts its GDP. This picture suggests that Bosnia's domestic tourism, while lagging behind its foreign counterpart, has a symmetric effect on Bosnia's economy. This symmetry suggests that policies which equally bolster or hinder domestic tourism will have similar, but opposite effects on the GDP. Furthermore, considering that domestic tourist length of stay seems to have a larger impact on GDP than domestic tourist arrivals, policies should focus on encouraging longer domestic tourists stays, perhaps by encouraging infrastructural changes that will reduce travel times (think better intra-national highways or fast train services) or by offering services / deals (think hotel discounts for longer stays or hotel loyalty programs) which incentivize longer visits.<br/><br/>
In contrast, Bosnia's foreign tourist industry has a statistically significant asymmetric impact on Bosnia's GDP. In particular, foreign tourist arrivals can benefit from policies which focus more on increasing tourist arrivals. Possible strategies here include international advertisement campaigns, easier tourist visa issuances and ideally a digitized process which can be completed upon arrival, and improved international airline connections. On the other hand, negative shocks to foreign tourist arrivals, while decreasing GDP, have a smaller impact on Bosnia's economy than positive shocks. This suggests that Bosnia can insulate from downtrends in foreign arrivals by attempting to recoup losses with policies which bolster arrivals.<br/><br/>
In the end, it's Bosnia's foreign tourist length of stay which is the most interesting variable. This is a variable which also has a statistically significant asymmetric effect on Bosnia's long-run economy, but as the long-run coefficient on the negative partial sums of changes in <span class="wfvar">FTS</span> is negative, GDP actually benefits from a decrease in foreign tourist length of stay. This suggests that Bosnia benefits from higher foreign tourist turnover, perhaps due to revenues gained on tourist visas. Another possible explanation is overcrowding which can possibly lead to inefficiencies in providing services, as well as crowding out of domestic tourists which refuse to participate in the tourist market at times when it is overwhelmed by foreigners. This is certainly suggested in Figure 4b which shows that domestic and foreign tourist length of stays do not peak at the same time during the year.<br/><br/><br/><br/>
<hr />
<h3 class="seccol", id="sec7">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/ev13_nardl/workfiles/bih_tourism.wf1"'><b class="wf">BIH_TOURISM.WF1</b></a></li>
<li><a href="http://www.eviews.com/blog/ev13_nardl/workfiles/bih_tourism.prg"'><b class="wf">BIH_TOURISM.PRG</b></a></li>
</ul><br /><br />
<hr />
<h3 class="seccol", id="sec8">References</h3>
<ol class="bib2xhtml">
<li id="kahneman_tversky_1979">
Kahneman, D. and Tversky, A., (1979). Prospect theory: an analysis of decision under risk. <cite>Econometrica</cite>, 47: 263–291.
</li>
<li id="pesaran_et_al_2001">
Pesaran, M. H., Shin, Y., and Smith, R. J, (2001). Bounds testing approaches to the analysis of level relationships. <cite>Journal of Applied Econometrics</cite>, 16(3): 289–326.
</li>
<li id="shiller_2005">
Shiller, R. J., (2005). Irrational exuberance. <cite>Princeton University Press, Princeton</cite>, 2nd edition.
</li>
<li id="shin_et_al_2014">
Shin Y., Yu B., Greenwood-Nimmo M., (2014). Modelling Asymmetric Cointegration and Dynamic Multipliers in a Nonlinear ARDL Framework <cite>Festschrift in honor of Peter Schmidt</cite>, 281–314.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com16tag:blogger.com,1999:blog-6883247404678549489.post-87523695254332567312022-08-22T16:13:00.004-07:002022-08-22T16:36:59.944-07:00EViews 13 is Released!<p><br /></p><p>We are pleased to announce that EViews 13 has been released! Packed with new features and enhancements, EViews 13 can be purchased as either an <a href="https://eviews.com/general/prices/prices.html" target="_blank">upgrade</a> or a new <a href="https://eviews.com/general/prices/prices.html" target="_blank">purchase</a> for single user licenses. Volume license customers will be receiving their complimentary upgrades soon!<span></span></p><a name='more'></a><p></p><h2 style="text-align: left;">Econometrics</h2><p>EViews 13 features a number of new econometric features.</p><h3 style="text-align: left;">Non-linear ARDL Estimation</h3><p>Improvements to existing tools for analyzing data using Autoregressive Distributed Lag Models (ARDL), featuring estimation of Nonlinear ARDL (NARDL) models which allow for more complex dynamics, with explanatory variables having differing effects for positive and negative deviations from base values. Watch our<a href="https://youtu.be/ikIByGq1izQ" target="_blank"> YouTube video </a>for a demonstration.</p><p><br /></p><h3 style="text-align: left;">Improved PMG Estimation</h3><p>EViews 13 extends the estimation of PMG models to support:</p><p></p><ul style="text-align: left;"><li>A greater range of deterministic trend specifications (including those with fully restricted constant and trend terms)</li><li>Specifications with asymmetric regressors.</li></ul><p></p><p><br /></p><h3 style="text-align: left;">Difference-in-difference Estimation</h3><p>Difference-in-difference (DiD) estimation is a popular method of causal inference that allows estimation of the average impact of a treatment on individuals.</p><p>EViews 13 offers tools for estimation of the DiD model using the common two-way fixed- effects (TWFE) method, as well as post-estimation diagnostics of the TWFE model, such as those by Goodman-Bacon (2021), Callaway and Sant’Anna (2021), and Borusyak, Jaravel, and Spiess (2021).</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhq2V7jZsZXxkab5cS9mWoEe05A6AbRj3PoRfs2Ns-_SAGyUUaTylJtxoYdOisRHAUptE4doQKlrNX0670T26r82ihAf4C7qHVsmOi4asT3cwX7gRSBxIwro9Uhtbssmbexy0PnmyLnyFuHnWOEJOhZB-uuNpCRrSq5FQxQDBGea0qhg6JPia9AeMj7pw/s1036/goodbacon.png" style="margin-left: auto; margin-right: auto;"><img alt="Goodman Bacon Decomposition" border="0" data-original-height="720" data-original-width="1036" height="445" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhq2V7jZsZXxkab5cS9mWoEe05A6AbRj3PoRfs2Ns-_SAGyUUaTylJtxoYdOisRHAUptE4doQKlrNX0670T26r82ihAf4C7qHVsmOi4asT3cwX7gRSBxIwro9Uhtbssmbexy0PnmyLnyFuHnWOEJOhZB-uuNpCRrSq5FQxQDBGea0qhg6JPia9AeMj7pw/w640-h445/goodbacon.png" title="Goodman Bacon Decomposition" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Goodman-Bacon Decomposition</td></tr></tbody></table><br /><p><br /></p><p><br /></p><h3 style="text-align: left;">Bayesian Time-Varying Coefficient VAR Estimation</h3><p>Standard VAR models impose the constraint that the coefficients are constant through time. This is often not true of macroeconomic relationships. Consequently, in recent years VAR estimators that allow coefficients to change have become popular.</p><p>Consequently, EViews 11 introduced Switching VAR - a class of VAR that allows discrete occasional changes in the coefficients of the VAR.</p><p>EViews 13 expands this further by introducing Bayesian Time-varying coefficient VAR models, which allow continuous smooth changes in the coefficients. Watch our <a href="https://youtu.be/26bqgohPPjA" target="_blank">YouTube video</a> for a demonstration.</p><p><br /></p><h3 style="text-align: left;">Cointegration Testing Enhancements</h3><p>EViews 13 features improvements to Johansen cointegration testing, including:</p><p><br /></p><p>•<span style="white-space: pre;"> </span>New deterministic trend settings</p><p>•<span style="white-space: pre;"> </span>Specification of exogenous variables as outside or inside (or both) the cointegrating equation</p><p><br /></p><h3 style="text-align: left;"><a href="https://eviews.com/EViews13/ev13whatsnew.html" target="_blank">And many more!</a></h3><p><br /></p><h2 style="text-align: left;">Non-Econometrics</h2><p>EViews 13 also introduces new interface and programming enhancements, new data handling features, and, as always, improvements to the graphing and table engines. </p><h3 style="text-align: left;">Pane And Tab Alternative User Interface</h3><p>EViews 13 offers a new, alternative user interface mode that employs panes and tabs in places of multiple windows. The built-in organization properties of this interface may be ideally suited to smaller display environments.</p><p><center><a href="https://www.eviews.com/EViews13/images/Panes.gif" target="_blank"><img src="https://www.eviews.com/EViews13/images/Panessm.gif" width="469" alt="Panes" /></a></center></p><h3 style="text-align: left;">Programming Language Debugging and Dependency Tracking</h3><p>EViews 13 now offers tools for debugging an EViews program to help you to identify issues or locate the source of problems. The debugging tools allow you to set breakpoints on spe- cific lines, run the program until it hits that breakpoint, and then examine at the state of your workfile or variables at that point in the program execution.</p><p>Further, EViews 13 also provides a new feature to automatically log a program’s external dependencies (e.g. workfiles, databases, and other programs), allowing you to track which files are required and used by a program.</p><p><center><a href="https://www.eviews.com/EViews13/images/Debug.gif" target="_blank"><img src="https://www.eviews.com/EViews13/images/debugsm.gif" width="469" alt="Panes" /></a></center></p><h3 style="text-align: left;">Jupyter Notebook Support</h3><p>Jupyter is a web-based interactive development environment that allows users to create notebooks for documenting computational workflow. EViews 13 Enterprise can now be used as a Jupyter kernel. This means you can use Jupyter Notebook to run and organize an EViews program and display results from within the Jupyter Notebook.</p><p>View our <a href="https://youtu.be/YPQFi8xTe1Y" target="_blank">YouTube </a>demonstration!</p><p><br /></p><h3 style="text-align: left;">Daily Seasonal Adjustment</h3><p>Daily Seasonal Adjustment is new form of seasonal adjustment added to the already extensive collection available in EViews. This feature allows adjustment of daily data using the algorithm of Ollech (2021). More details can be seen in our <a href="https://youtu.be/XLc7I-1LW6g" target="_blank">YouTube</a> demonstration.</p><p><br /></p><h3 style="text-align: left;">Data Connectivity </h3><p>EViews 13 introduces connectivity to multiple new online data sources: The World Health Organization, Trading Economics, Australian Bureau of Statistics, France's L’Institut national de la statistique et des études économiques (INSEE), and Germany's Bundesbank.</p>
<center><a href="https://www.eviews.com/EViews13/images/Trading.gif" target="_blank"><img alt="Trading" src="https://www.eviews.com/EViews13/images/Tradingsm.gif" width="469" /></a></center>
<center><a href="https://www.eviews.com/EViews13/images/WHO.gif" target="_blank"><img alt="WHO" src="https://www.eviews.com/EViews13/images/WHO.gif" width="469" /></a></center>
<br /><p><br /></p><p><br /></p><h3><a href="https://eviews.com/EViews13/ev13whatsnew.html" target="_blank">And many more!</a></h3><p><br /></p>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com3tag:blogger.com,1999:blog-6883247404678549489.post-16301401579753563562022-04-19T10:31:00.001-07:002022-04-23T06:22:15.245-07:00Simulation and Bootstrap Forecasting from Univariate GARCH Models<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Eren Ocakverdi</i><br /><br />
This blog piece intends to introduce a new add-in (i.e. <a href='http://www.eviews.com/Addins/simulugarch.aipz'>SIMULUGARCH</a>) that extends the current capability of EViews’ available features for the forecasting of univariate GARCH models.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Forecasting with Simulation or Bootstrap</a>
<li><a href="#sec3">Application to price of Bitcoin</a>
<li><a href="#sec4">Files</a>
<li><a href="#sec5">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction</h3>
Estimation of conditional volatility is not an easy task as it is an unobserved phenomenon and therefore certain assumptions need to be made for that purpose. Once the model parameters are identified, it is relatively straightforward to produce forecasts. However, unlike the regular mean models (e.g. OLS, ARIMA etc.), generating a confidence interval around the forecast of conditional volatility requires an additional effort.<br/><br/><br/><br/>
<h3 class="seccol", id="sec2">Forecasting with Simulation or Bootstrap</h3>
Suppose that we prefer a GARCH(1,1) model to explain the volatility dynamics of the logarithmic return of a financial asset: <br /><br />
\begin{align*}
\Delta \log(P_t) &= r_t = \bar{r} + e_t\\
e_t &= \epsilon_t \sigma_t\\
\sigma_t^2 &= \omega + \alpha_1 e_{t - 1}^2 + \beta_1\sigma_{t - 1}^2
\end{align*}
where $ \epsilon_t \sim IID(0,1) $.
As shown by Enders (2014), h-step-ahead forecast of the conditional variance is as follows:<br /><br />
\begin{align*}
\sigma_{t + h}^2 &= \omega + \alpha_1 e_{t + h -1}^2 + \beta_1\sigma_{t + h - 1}^2\\
E(\sigma_{t+h}^2) &= \omega + \alpha_1 E(e_{t + h - 1}^2) + \beta_1 E(\sigma_{t + h - 1}^2)\\
E(e_{t+h}^2) &= E(e_{t + h}^2\sigma_{t + h}^2) = E(\sigma_{t + h}^2)\\
E(\sigma_{t + h}^2) &= \omega + (\alpha_1 + \beta_1)E(\sigma_{t + h - 1}^2)
\end{align*}
If $ (\alpha_1 + \beta_1) < 1 $, then it implies that forecasts of conditional variance will converge to a long-run value of $ E(\sigma_t^2) = \omega/(1 - \alpha_1 - \beta_1) $.<br /><br />
Median of conditional variance would be a useful gauge as a central tendency since the variance is a squared value and therefore has a skewed distribution towards larger values. In order to compute the median value along with the associated confidence interval, we need different realizations of forecasted values of conditional variance. One can either simulate or bootstrap the values of innovations (i.e. $ \epsilon_t $) to do so. Simulation generates random samples of innovations from the theoretical distribution assumed in the estimation of model. Bootstrap, on the other hand, does resampling (with replacement) of innovations and is therefore mimics the sampling process successfully as long as the observed distribution of sample resembles the distribution of population.<br/><br/><br/><br/>
<h3 class="seccol", id="sec3">Application to price of Bitcoin</h3>
Bitcoin has emerged as the newest and well-known kid on the block (of investment products) and its value has been quite volatile so far (<b>XBTUSD.WF1</b>).<br /><br />
Simple visual inspections of price level and log returns show us the explosive dynamics and large fluctuations during the analysis period of 2011-2021 (<b>SIMULUGARCH_EXAMPLE.PRG</b>).<br /><br />
<!-- :::::::::: FIGURES 1a and 1b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 1a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/simulugarch/images/xbtusd.png"><img height="auto"
src="http://www.eviews.com/blog/simulugarch/images/xbtusd.png" title="XBTUSD"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 1b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/simulugarch/images/dlogxbtusd.png"><img height="auto"
src="http://www.eviews.com/blog/simulugarch/images/dlogxbtusd.png" title="Log Difference of XBTUSD"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1a: XBTUSD</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 1b: Log Difference of XBTUSD</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 1a and 1b :::::::::: -->
In order to estimate the conditional variance of returns, a simple GARCH(1,1) is fitted to log returns of Bitcoin.<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/simulugarch/images/est.png"><img height="auto"
src="http://www.eviews.com/blog/simulugarch/images/est.png" title="GARCH(1,1)"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: GARCH(1,1)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
Level series depicts that there were some severe price fluctuations during 2021, whereas estimated conditional variance of return series suggests that the highest spikes seem to have occurred during 2013.<br /><br />
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/simulugarch/images/condvar.png"><img height="auto"
src="http://www.eviews.com/blog/simulugarch/images/condvar.png" title="CONDVAR"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: CONDVAR</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
Before forecasting the price level, one needs to generate future values of estimated conditional variance either by simulation or bootstrap. This is where the add-in comes handy:<br /><br />
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/simulugarch/images/dialog.png"><img height="auto"
src="http://www.eviews.com/blog/simulugarch/images/dialog.png" title="SIMULUGARCH Dialog"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: SIMULUGARCH Dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Details of the input parameters are explained in the help document that comes with the add-in package. Here, we change the default number of repetitions and forecast horizon to 10K and 22 steps, respectively. Also, fan chart is preferred to summarize the output.<br /><br />
Median scenario for volatility is a gradual increase over the coming month (i.e. 22 business days). This should be expected as the long-run value (i.e. unconditional variance) is calculated to be around 156. However, please keep in mind that median value is always smaller than the mean in right-skewed distributions.<br /><br />
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 5a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/simulugarch/images/forecast_dep.png"><img height="auto"
src="http://www.eviews.com/blog/simulugarch/images/forecast_dep.png" title="Forecast of Dependent Variable"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 5b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/simulugarch/images/forecast_condvar.png"><img height="auto"
src="http://www.eviews.com/blog/simulugarch/images/forecast_condvar.png" title="Forecast of Conditional Variance"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5a: Forecast of Dependent Variable</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 5b: Forecast of Conditional Variance</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
The role of volatility in forecast uncertainty becomes visible as we simulate the future values of price level, which would have important financial implications (e.g. for computation of Value-at-Risk). Even by the end of next month, for instance, USD price of Bitcoin might climb to as high as 70K or drop to as low as 35K!<br /><br /><br /><br />
<hr />
<h3 class="seccol", id="sec4">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/simulugarch/workfiles/xbtusd.wf1"'><b class="wf">XBTUSD.WF1</b></a></li>
<li><a href="http://www.eviews.com/blog/simulugarch/workfiles/simulugarch_example.prg"'><b class="wf">SIMULUGARCH_EXAMPLE.PRG</b></a></li>
</ul><br /><br />
<hr />
<h3 class="seccol", id="sec5">References</h3>
<ol class="bib2xhtml">
<li id="enders-2004">
Enders, W. (2014), <i>Applied Economic Time Series, Fourth Edition"</i>, John Wiley & Sons.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com3tag:blogger.com,1999:blog-6883247404678549489.post-40376420410486653872021-11-01T15:20:00.001-07:002021-11-01T15:23:17.427-07:00SpecEval Add-In - Part 2<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.classic_table {
border: 1px solid black;
border-collapse: collapse;
border-spacing: 0px;
}
.classic_table tr {
border-bottom: 1px solid black;
border-top: 1px solid black;
}
.classic_table tr:first-child {
border-top: none;
}
.classic_table tr:last-child {
border-bottom: none;
}
.classic_table td {
border-left: 1px solid black;
border-right: 1px solid black;
padding-right: 10px;
padding-left: 10px;
}
.classic_table td:first-child {
border-left: none;
}
.classic_table td:last-child {
border-right: none;
}
.break_row {
border-bottom: 3px solid #fa5e5e !important
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Kamil Kovar</i><br /><br />
This is the second in a series of blog posts (the first can be found <a href="http://www.eviews.com/blog/speceval/images/overview_est.png">here</a>) that present a new EViews add-in, SpecEval, aimed at facilitating development of time series models used for forecasting. This blog post will focus on the illustration of the basic outputs of the add-in by following a simple application, which will also illustrate the model development process that the add-in aims to facilitate. Next section provides brief discussion of this process, while the following section discusses the data and models considered. The main content of this blog post is contained in next two sections, which discuss basic execution before presenting the actual application.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Model Development Process</a>
<li><a href="#sec2">Data and Models</a>
<li><a href="#sec3">Execution</a>
<li><a href="#sec4">Model Forecasting Performance</a>
<li><a href="#sec5">Model Sensitivity</a>
<li><a href="#sec6">Concluding Remarks</a>
<li><a href="#sec7">Footnotes</a>
</ol><br />
<h3 class="seccol", id="sec1">Model Development Process</h3>
The SpecEval add-in was created with a particular model development process in mind. Specifically, the add-in is based on the belief that model development process should be both iterative and – more importantly – interactive. It should be iterative in that it proceeds in steps, each improving the earlier version of the model, be it in form of inclusion of additional regressors or modification of already included regressors. It should be interactive in that the improvements should be based on information about shortcomings of the earlier model. Importantly, this means that the development process should be done by human developer, rather than rely on computer algorithm, since it requires a modicum of imagination.<br /><br />
The workflow of the model development process is shown in figure below. The process starts with initial proposed model, which is then evaluated using the outputs of the add-in. These outputs contain multiple relevant pieces of information, from basic model properties entailed in estimation output such as regression coefficients, to forecast performance and finally sensitivity properties. Each of these can be used to identify shortcomings of the current model and propose modifications which will address these shortcomings, in an interactive model development process on part of model developer.<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/model_development_process.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/model_development_process.png" title="Model Development Process"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Model Development Process</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
Since in most situations the information can be ordered in terms of importance – e.g. "correct" coefficient signs are necessary, while desired degree of sensitivity often is not - one can view the process as linear, proceeding from basic properties through forecasting performance to sensitivity. We will roughly follow this model development process in the remainder of this blog post.<br /><br /><br /><br />
<h3 class="seccol", id="sec2">Data and Models</h3>
The add-in will be illustrated on modelling a relatively simple time series – an industrial production in Czechia.<sup><a href="#fn1" id="ref1">1</a> The quarterly series is displayed in figure below. It is clear, that the series is trending, but that it does not follow a deterministic trend. Correspondingly, in what follows we will use log-difference of industrial production as the dependent variable.</sup><br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip.png" title="Czechia Industrial Production"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Czechia Industrial Production</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
What model should we use for forecasting industrial production? The answer to this question depends on the environment in which one is forecasting given series. The type of models can vary from simple univariate reduced from ARIMA models, through their multivariate multiequation cousins, VAR models, to structural single or multiple-equation models. Here we will illustrate the SpecEval add-in on the multivariate single-equation models for which the add-in is most suitable. The choice corresponds to environment where one has available forecasts for multiple potential right-hand side variables, such as GDP, and wants to “expand” these forecasts to industrial production, i.e. produce forecasts industrial production that are consistent with forecasts for other macroeconomic variables. This is fairly common task, especially in the context of macroeconomic stress testing.<br/><br/>
Within this class of models, our starting point is simple regression linking a log-difference of the industrial production to a log-difference of the GDP:
$$
\text{dlog}(IP_{t}) = \beta_0 + \beta_1 \text{dlog}(GDP_t)
$$
This equation simply postulates that current growth rate of industrial production can be well predicted by the current growth rate of GDP, a reasonable postulation, given that both are measures of economic activity. Later we will enrich this model by including additional variables/regressors based on the analysis of this model. Before considering additional multivariate models, though, we will use simple ARIMA(0,1,2) model as our benchmark. The first equation is called <b>EQ_GDP</b> while the second is called <b>EQ_ARIMA</b>.<br/><br/><br/><br/>
<h3 class="seccol", id="sec3">Execution</h3>
SpecEval allows modeler to produce a report by either executing it through GUI or issuing the relevant command from given equation object, an approach we take here:
<pre><code>
eq_gdp.speceval(noprompt)
</code></pre>
This command would produce and display spool with several output objects that can be used to evaluate the given equation (see left panel of figure below). However, it is more interesting to consider the given equation in context of the benchmark ARIMA equation and hence execute SpecEval for both equations, what can be don by simply adding another equation to a list of specifications:
<pre><code>
eq_gdp.speceval(spec_list=eq_arima)
</code></pre>
What we have done here is just specify that the list of specification for which the add-in will be executed should include also ‘eq_arima’ equation. As a result, the add-in will produce and display spool that is organized by the type of output, so that same outputs for different specifications are next to each other, facilitating quick comparison. See right panel of figure below.<br/><br/>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/output_spools.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/output_spools.png" title="Output Spools"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Output Spools</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<h3 class="seccol", id="sec4">Model Forecasting Performance</h3>
Starting point of analysis of any forecasting model is of course its estimation output, and so SpecEval includes it among its outputs. Rather than using the standard estimation output reported by Eviews, the SpecEval reports estimation output that is enhanced in several ways, such as color coding and formatting of numbers, as well as information about included variables:<br/><br/>
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_estimation.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_estimation.png" title="Czechia Industrial Production - Estimation"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: Czechia Industrial Production - Estimation</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Estimation output provides some basic information about the model. However, it provides limited information about forecasting performance. True, statistics like R-squared, standard deviation of residual, or Durbin-Watson statistic can be re-interpreted as indicators of forecasting performance, but only as very limited ones. Addressing this shortcoming is one of the key motivations for SpecEval and hence the report includes explicit information about forecasting performance.
First, there is table with values of forecast precision metrics, such as Root Mean Square Percentage Error (RMSPE), that are color-coded according to their rank. For our application this table shows that the proposed model is worse in terms of forecasting performance than the benchmark ARIMA model if we consider longer forecasting horizons, which is dispiriting conclusion given that our model includes additional information. <br/><br/>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_rmspe.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_rmspe.png" title="Czechia Industrial Production - RMSPE"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Czechia Industrial Production - RMSPE</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
Before despairing and concluding that GDP is not useful for forecasting industrial production, it is useful to look at forecasting performance in more detail than what is incorporated in the summary statistics. Specifically, we can leverage the second output focused on forecasting performance, the forecast summary graphs. The motivation for those is simple: precision metrics are summary statistics over the whole backtesting sample, and hence it is possible that they mask important heterogeneity across the sample, something that forecast summary graphs will immediately reveal. This is indeed the case in our application since bad forecasting performance for the our model is concentrated in the early periods – after 2000 the forecasting performance looks much better than that of the benchmark model.<br/><br/>
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_forecast_summary.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_forecast_summary.png" title="Czechia Industrial Production - Forecast Summary"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: Czechia Industrial Production - Forecast Summary</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
SpecEval provides flexibility to explore this issue in further detail. For example, the forecasting performance in beginning of the sample is so bad that one would likely suspect issues with the estimated coefficients. To check this out, we can include coefficient stability graphs among the outputs:
<pre><code>
eq_gdp.speceval(spec_list=eq_arima)
</code></pre>
Here we just specified that the execution list should also include stability outputs, apart from the normal outputs. The resulting graph displayed below shows the full time series of recursive regression coefficients, together with their standard errors. What is crucial from our perspective, is that the graph indeed confirms our suspicions: the coefficient on GDP in the early parts of the sample is negative, which is at odds with our expectations and likely reflects the very small number of observations used for estimation in the beginning of the backtesting sample.<br/><br/>
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_coef_stability.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_coef_stability.png" title="Czechia Industrial Production - Coefficient Stability"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7: Czechia Industrial Production - Coefficient Stability</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
Another way we can explore this issue is to switch from out-of-sample to in-sample forecasting. In other words, we can use the actual equation estimated on the full available sample to make the individual backtest forecasts. Or alternatively, and more simply, we can stick with out-of-sample forecasts but limit the evaluation sample to start in 2000q1. The two execution commands corresponding to these options are following:
<pre><code>
eq_gdp.speceval(spec_list=eq_arima,oos=”f”)
eq_gdp.speceval(spec_list=eq_arima,tfirst_test=”2000q1”)
</code></pre>
Either of these approaches show that the initial superiority of ARIMA model was consequence of bad forecasts based on short estimation sample, as evidenced by tables below. Crucially, these early forecasts do not provide approximation of what the forecast would be at that point in time: any economist operating the model would likely discard forecasts from model with negative coefficient on GDP. However, without the knowledge of this artifact of the results – such as when we would rely on precision metrics alone, as is customary - we would potentially discard the model altogether. This shows both the value added by the SpecEval and its flexibility, and the value of incorporating graphical information about forecasting performance. Document ‘SpecEval illustrated’ provides many additional examples of this flexibility and how it can be leveraged in developing forecasting models.<br/><br/>
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_is_rmspe.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_is_rmspe.png" title="Czechia Industrial Production - In-Sample RMSPE"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: Czechia Industrial Production - In-Sample RMSPE</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
<br/><br/><br/><br/>
<h3 class="seccol", id="sec5">Model Sensitivity</h3>
The second main focus on SpecEval outputs – in addition to forecasting performance – is evaluation of model sensitivity, that is how does the proposed model respond to outside shocks. There are three types of outputs that belong to this category. First, SpecEval allows user to specify set of historical sub-samples for which forecasts performance can be analyzed separately, be it in terms of forecast precision metrics or in terms of forecast graphs, on which we will focus here. The above figures captured the forecasting performance over the whole sample, but sometimes performance for particular historical period is of special interest given their unusual nature relative to the rest of the backtest sample. An example from credit risk modelling are recessionary periods or periods of financial stress. To analyze such period in context of our example, we simply need to include specify sub-samples of interest:
<pre><code>
eq_gdp.speceval(subsamples=”2008q3-2009q4, 2011q3-2013q2”,oos=”f”)
</code></pre>
The top panels of figure below show the resulting graphs which capture the forecast from our model over the period of Great Recession and European Sovereign Debt Crisis. The conclusion is not very positive since the model fails to predict the magnitude of the decline in industrial production, especially during the Great Recession.<br/><br/>
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_subsample_forecast.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_subsample_forecast.png" title="Czechia Industrial Production - Subsample Forecast"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9: Czechia Industrial Production - Subsample Forecast</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
One potential solution is to allow for the relationship between GDP and industrial production to be different during normal and recessionary periods by adding interaction with dummy variable indicating recessionary period:
$$
\text{dlog}(IP_{t}) = \beta_0 + \beta_1 \text{dlog}(GDP_t) + \beta_2 \text{dlog}(GDP_t) D_t^recession
$$
<!-- :::::::::: FIGURE 10 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_estimation_recession.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_estimation_recession.png" title="Czechia Industrial Production - Estimation (Recession)"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 10: Czechia Industrial Production - Estimation (Recession)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 10 :::::::::: -->
The forecasts from resulting model captured in bottom panels of the above figure show significant improvement over the original model in terms of forecasting during recessionary periods in context of in-sample forecasting.<br/><br/>
Second category of outputs focused on model sensitivity displays conditional scenario forecasts made using given model specification. This entails making forecast for the dependent variable under alternative scenario paths for the independent variables. While this is especially useful in situation when such scenario forecasting is of interest, it is useful more generally in model development as source of alternative information about the model and its behavior, something we illustrate here. To obtain conditional scenario forecasts using SpecEval we just need to specify list of scenarios as one of the arguments as in the first argument in following command:
<pre><code>
eq_gdp_dummy.speceval(scenarios=”bl sd”,exec_list=”normal scenarios_individual”,tfirst_sgraph=”2006q1”, graph_add_scenarios="gdp[r],trans=”deviation”)
</code></pre>
Here, apart from the list of scenarios, we have specified several other options: we have indicated that we want to have individual scenario graphs as the output (rather than graphs showing all scenarios together); that we want the scenario graphs to start in 2006q1; that they should also include GDP (as opposed to only the industrial production); and that the transformation charts should be in terms of deviations from baseline. Top panels of figure below show the graph capturing level of the forecast and graph capturing the deviation from baseline, respectively. These leave us with mixed feelings about the model. On the positive side, the decline in industrial production seems appropriate given the decline in GDP - as was historically the case, industrial production does fall significantly more than GDP, reflecting the fact that the combined coefficient is above 2. On the negative side the industrial production remains significantly below GDP even in the long run, which seems counterintuitive - one would expect both the drop and rebound in industrial production to be larger so that the permanent effect on industrial production is only slightly larger than for GDP.<br/><br/>
<!-- :::::::::: FIGURE 11 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_forecast_scenario.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_forecast_scenario.png" title="Czechia Industrial Production - Forecast Scenario"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 11: Czechia Industrial Production - Forecast Scenario</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 11 :::::::::: -->
The reason why the model fails to make such forecast is because it makes industrial production more sensitive movements in GDP only during recessions, not during recoveries. One simple way to address this is to replace the dummy indicating recession by dummy that captures both recessions and recoveries. Here, we simply use new dummy that is now equal to 1 also 4 quarters after the end of recessions:
$$
\text{dlog}(IP_{t}) = \beta_0 + \beta_1 \text{dlog}(GDP_t) + \beta_2 \text{dlog}(GDP_t) (\text{@}movav(D_t^recession, 4) > 1)
$$
The resulting scenario forecasts are in bottom panels of figure above and show that the model modification addressed our initial concerns: industrial production still falls more than GDP, but then also rebounds more strongly so that in the long run the shortfall in industrial production is only slightly larger than that of GDP.<br/><br/>
The inclusion of recession dummy was motivated by shortcomings of the model in terms of historical forecasts during recessionary periods, while its replacement by recession-and-recovery dummy was motivated by shortcomings in terms of scenario forecasts. However, it turns out that both modifications also help a lot with overall forecasting performance, as evidenced in the table below. In this sense analysis of model sensitivity and especially of its behavior in conditional scenarios are complementary to analysis overall forecasting performance and hence useful for model development purposes even if model sensitivity and scenario forecasting is itself not of importance.<br/><br/>
<!-- :::::::::: FIGURE 12 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_scenario_rmspe.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_scenario_rmspe.png" title="Czechia Industrial Production - In-Sample RMSPE (Scenario)"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 12: Czechia Industrial Production - In-Sample RMSPE (Scenario)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 12 :::::::::: -->
Final category of model sensitivity outputs is composed of shock response graphs. The concept should be familiar from the VAR literature: one studies how does the dependent variable respond to shocks to individual independent variables.<sup><a href="#fn2" id="ref2">2</a></sup> SpecEval implements this procedure for single equation multivariate time series models; one simply needs to include shocks in the execution list:
<pre><code>
eq_gdp_dummy2.speceval(exec_list=normal shocks, shock_type=transitory)
</code></pre>
As a result, the report will now include two types of figures corresponding to two types of shocks, depending on whether the underlying independent variables or the actual regressor is being shocked. In either case the corresponding figure shows 4 graphs: (1) graph with two paths for the underlying dependent variable, without shock and one with shock, (2) graph with deviation/difference between the two paths, (3&4) analogical graphs for the shocked variable/regressors. Below is example for a modified version of our model with dummy variable, which now includes also lagged dependent variable and lag of the GDP regressor. This means that the model now belongs to the Autoregressive Distributed Lag (ARDL) family, making its shock responses dynamic and hence hard to gauge from estimation output alone. For such models visualizing the exact shock responses can be very valuable. For example, in current context the transitory decrease in GDP (see bottom panels) leads to initial drop in industrial production, which is then reversed so much that industrial production is above the no-shock path for several quarters (see top panels).<br/><br/>
<!-- :::::::::: FIGURE 13 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_ir.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_ir.png" title="Czechia Industrial Production - Shock-Response"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 13: Czechia Industrial Production - Shock-Response</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 13 :::::::::: -->
This shock response might be unappealing from scenario perspective because it can easily lead to downside scenarios characterized by recession and recovery in GDP featuring industrial production that temporarily rises above baseline. In this way studying shock responses can be important tool when models will be used in scenario forecasting. However, the value is not limited to this use case: the above shock response would probably alert the modeler that different model structure – for example replacing lagged dependent variable with autoregressive error – might be preferable from forecasting perspective. Indeed, while the ARDL model has worse forecasting performance than the model without any lagged components, model that includes only AR(1) error – and hence does not feature the shock response reversals - has significantly better forecasting performance, as shown in table below.<br/><br/>
<!-- :::::::::: FIGURE 14 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_compare_rmspe.png"><img height="auto"
src="http://www.eviews.com/blog/speceval_part2/images/czechia_ip_compare_rmspe.png" title="Czechia Industrial Production - RMSPE Comparison"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 14: Czechia Industrial Production - RMSPE Comparison</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 14 :::::::::: -->
<br/><br/><br/><br/>
<h3 class="seccol", id="sec6">Concluding Remarks</h3>
This part of the blog post series dedicated to SpecEval was focused on showcasing how SpecEval can be operated, what are the basic outputs and how they can be leveraged in model development process. However, for the sake of brevity the possibilities highlighted here were far from exhaustive – reader should consult ‘SpecEval illustrated’ document for more detailed discussion. That said, the next blog post in this series will focus on one particular functionality of SpecEval – the use and value of transformations in model development process.<br/><br/><br/><br/>
<hr />
<h3 id="sec7">Footnotes</h3>
<sup id="fn1">1. The data and together with program that will replicate the outputs reported here can be found on my personal <a href="https://drive.google.com/open?id=1gNdUVrCOVY2xCfsO1nBhxonTxa0bQRyT">website</a>.<a href="#ref1" title="Jump back to footnote 1 in the text.">↩</a></sup><br/>
<sup id="fn2">2. This kind of analysis is readily available in Eviews (or other statistical packages) for VAR models. However, this type of analysis is puzzlingly uncommon in case of single equation multivariate time series models, and correspondingly is not supported by Eviews of other statistical packages, a gap SpecEval tries to fill. Note that for univariate ARIMA models Eviews – unlike most other statistical packages - does support this kind of analysis.<a href="#ref2" title="Jump back to footnote 2 in the text.">↩</a></sup><br/>
<sup id="fn3">3. Note that these two features – in-sample forecasting and inclusion of multiple equations in the forecasting model – are possible thanks to in-built EViews functionality and hard to replicate in other statistical programs. The former is thanks to the separation between estimation and forecasting samples, the latter thanks to flexible model objects.<a href="#ref3" title="Jump back to footnote 3 in the text.">↩</a></sup>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-79586036652011853442021-05-13T08:29:00.000-07:002021-05-13T08:29:38.682-07:00Box-Cox Transformation and the Estimation of Lambda Parameter<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 0px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Eren Ocakverdi</i><br /><br />
This blog piece intends to introduce a new add-in (i.e. <a href='http://www.eviews.com/Addins/boxcox.aipz'><b>BOXCOX</b></a>) that can be used in applying power transformations to the series of interest and provides alternative methods to estimate the optimal lambda parameter to be used in transformation.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Box-Cox family of transformations</a>
<li><a href="#sec3">Application to Turkey’s tourism data</a>
<li><a href="#sec4">Files</a>
<li><a href="#sec5">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction</h3>
A stationary time series requires stable mean and variance, which can then be modelled through ARMA-type models. If a series does not have a finite variance, it violates this condition and will lead to ill-defined models. Common practice in dealing with time varying volatility is modeling the variance explicitly through GARCH-type models. However, when the variance of a given series changes with respect to the level, then there is a practical alternative: transforming the original series so as to scale down (up) the large (small) values.<br/><br/><br/><br/>
<h3 class="seccol", id="sec2">Box-Cox family of transformations</h3>
Box and Cox (1964) proposed a family of power transformations, which later became a popular tool in time series analysis to deal with skewness in the data:<br /><br />
$$
\tilde{y}_t =
\begin{cases}
\frac{y_t^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0\\
log(y_t) & \text{if } \lambda = 0
\end{cases}
$$
Transformation of a series is straightforward once the value of $\lambda$ is known. One way to determine the value of $\lambda$ is to maximize the (regular or profile) log likelihood of a linear regression model fitted to data. For trending and/or seasonal data, appropriate dummy variables are added to regressions to capture such effects. Guerrero (1993) proposed a model-independent method to select $\lambda$ that minimizes the coefficient of variation for the subsets of series.<br /><br /><br /><br />
<h3 class="seccol", id="sec3">Application to Turkey’s Tourism Data</h3>
With its pervasive trend and seasonal components, monthly tourism statistics emerge as a natural candidate for implementation (<a href="http://www.eviews.com/blog/boxcox/workfiles/tourism.wf1"><b>TOURISM.WF1</b></a>). Suppose that we want to carry out a counterfactual analysis to estimate the potential loss of visitors to Turkey in 2020 due to the COVID-19 pandemic. First, set the training sample to cover the period until the end of 2019.<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/boxcox/images/visitors.png"><img height="auto"
src="http://www.eviews.com/blog/boxcox/images/visitors.png" title="Visitors"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Visitors</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
Next, run the add-in. The following dialog pops up.<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/boxcox/images/boxcox.png"><img height="auto"
src="http://www.eviews.com/blog/boxcox/images/boxcox.png" title="Box-Cox Dialog"
width="180" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Box-Cox Dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
The add-in would compute the optimal value of lambda to be 0.106.<br /><br />
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/boxcox/images/visitors_transfrom.png"><img height="auto"
src="http://www.eviews.com/blog/boxcox/images/visitors_transfrom.png" title="Visitors (Box-Cox Transformation)"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Visitors (Box-Cox Transformation)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
We can then apply Auto ARIMA method to original series and supply the value of estimated lambda to Box-Cox transformation as the power parameter. Forecasts produced by Auto ARIMA method can also be combined via Bayesian Model Averaging.<br /><br />
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/boxcox/images/arima.png"><img height="auto"
src="http://www.eviews.com/blog/boxcox/images/arima.png" title="ARIMA Forecasting"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: ARIMA Forecasting Dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
As an alternative approach, one can also perform ETS Exponential Smoothing method on the transformed series to select for the best model and then back transform the forecasted values.<br /><br />
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/boxcox/images/visitors_loss.png"><img height="auto"
src="http://www.eviews.com/blog/boxcox/images/visitors_loss.png" title="Visitors Loss"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Visitors Loss</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
ARIMA model results imply that the number of visitors to Turkey might have decreased by 29 million during 2020. ETS model portrays an even worse picture by estimating a potential loss of 42 million visitors!<br /><br /><br /><br />
<hr />
<h3 class="seccol", id="sec4">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/boxcox/workfiles/tourism.wf1"'><b class="wf">TOURISM.WF1</b></a></li>
<li><a href="http://www.eviews.com/blog/boxcox/workfiles/boxcox_example.prg"'><b class="wf">BOXCOX_EXAMPLE.PRG</b></a></li>
</ul><br /><br />
<hr />
<h3 class="seccol", id="sec5">References</h3>
<ol class="bib2xhtml">
<li id="boxcox-1964">
Box, G.E.P., and Cox, D.R. (1964), "An analysis of transformations", <i>Journal of the Royal Statistical Society</i>, Series B, vol. 26, no. 2, pp. 211-246.
</li>
<li id="guerrero-1993">
Guerrero V.M. (1993), "Time-series analysis supported by power transformations", <i>Journal of Forecasting</i>, vol. 12, pp. 37-48.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com2tag:blogger.com,1999:blog-6883247404678549489.post-27693608770802674652021-05-04T14:13:00.003-07:002021-11-01T11:08:27.926-07:00SpecEval Add-In<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.classic_table {
border: 1px solid black;
border-collapse: collapse;
border-spacing: 0px;
}
.classic_table tr {
border-bottom: 1px solid black;
border-top: 1px solid black;
}
.classic_table tr:first-child {
border-top: none;
}
.classic_table tr:last-child {
border-bottom: none;
}
.classic_table td {
border-left: 1px solid black;
border-right: 1px solid black;
padding-right: 10px;
padding-left: 10px;
}
.classic_table td:first-child {
border-left: none;
}
.classic_table td:last-child {
border-right: none;
}
.break_row {
border-bottom: 3px solid #293d5c !important
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Kamil Kovar</i><br /><br />
This is the first in a series of blog posts that will present a new EViews add-in, <b>SpecEval</b>, aimed at facilitating time series model development. This blog post will focus on the motivation and overview of the add-in functionality. Remaining blog posts in this series will illustrate the use of the add-in.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Basic Principles</a>
<li><a href="#sec2">Comprehensiveness: What Does SpecEval Do?</a>
<li><a href="#sec3">Flexibility in Practice</a>
<li><a href="#sec4">What’s Next?</a>
<li><a href="#sec5">Footnotes</a>
</ol><br />
<h3 class="seccol", id="sec1">Basic Principles</h3>
The idea behind SpecEval is simple: to do model development effectively – especially in time constrained environment – one should have a tool that can quickly produce and summarize information about particular model. Such tool should satisfy three key requirements:<br /><br />
<ol>
<li>It should be very <b>easy</b> to use, so that its use does not introduce additional costs into the model development process.</li>
<li>It should be <b>comprehensive</b> in the sense that it includes all relevant information one would like to have when evaluating particular model.</li>
<li>It should be <b>flexible</b> so that user can easily change what information is included in particular situations. Flexibility is a necessary counterpart of comprehensiveness so that one avoids congestion.</li>
</ol>
The first requirement is facilitated by EViews add-in functionality which allows execution either through GUI or command, so that model evaluation can be performed repeatedly through one quick action. Apart from this, the add-in functionality and options are designed in a way that allows the user to easily adjust the execution settings. For example, the add-in can be executed both for one model at a time or for multiple models at the same time. Furthermore, including multiple models is as simple as just listing them (wildcards are acceptable). Meanwhile, each output type can be specified as part of the execution list, making it easy to include additional outputs.<br /><br /><br /><br />
<h3 class="seccol", id="sec2">Comprehensiveness: What Does SpecEval Do?</h3>
So what does SpecEval add-in do? In broad terms, it <b>produces tables and graphs that provide information about the model, and especially its behavior</b>. Note here that discussing the set of possible outputs (listed in the table below) is not in the scope of this blog post since most functionality will be illustrated in the blog posts to follow. Instead the table should highlight that the add-in is indeed comprehensive from a model development perspective.<sup><a href="#fn1" id="ref1">1</a></sup><br /><br />
<center>
<table class='classic_table'>
<tr style="color: white; background-color: #293d5c">
<td><b>Object Name</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td>Estimation output table</td>
<td>Adjusted regression output table</td>
</tr>
<tr>
<td>Coefficient stability graph</td>
<td>Graph with recursive equation coefficients</td>
</tr>
<tr class='break_row'>
<td>Model stability graph</td>
<td>Graph with recursive lag orders</td>
</tr>
<tr>
<td>Performance metrics tables</td>
<td>Table with values of forecast performance metrics</td>
</tr>
<tr>
<td>Performance metrics tables (multiple specifications)</td>
<td>Table with values of forecast performance metrics for given metric for all specifications</td>
</tr>
<tr>
<td>Forecast summary graph</td>
<td>Graph with all recursive forecasts with given horizons</td>
</tr>
<tr>
<td>Sub-sample forecast graph</td>
<td>Graph with forecast for given sub-sample</td>
</tr>
<tr>
<td>Subsample forecast decomposition graph</td>
<td>Graph with decomposition of sub-sample forecast</td>
</tr>
<tr class='break_row'>
<td>Forecast bias graph</td>
<td>Scatter plot of forecast and actual values for given forecast horizon (Minzer-Zarnowitz plot)</td>
</tr>
<tr>
<td>Individual conditional scenario forecast graph (level)</td>
<td>Graph with forecast for single scenario and specification</td>
</tr>
<tr>
<td>Individual conditional scenario forecast graph (transformation)</td>
<td>Graph with transformation of forecast for single scenario and specification</td>
</tr>
<tr>
<td>All conditional scenario forecast graph</td>
<td>Graph with forecasts for all scenarios for single specification</td>
</tr>
<tr>
<td>Multiple specification conditional scenario forecast graph</td>
<td>Graph with forecasts for single scenario for multiple specifications</td>
</tr>
<tr>
<td>Shock response graphs</td>
<td>Graphs with response to shock to individual independent variable/regressor</td>
</tr>
</table>
<br />
</center>
The first category of outputs includes information about the model in form of estimation output, with several enhancements that facilitate quick evaluation such as suitable color-coding. Moreover, the information about the model is not limited to final model estimates, but also includes information about recursive model estimates (e.g. recursive coefficients and/or lag orders). See figures below for illustration of both outputs.<br/><br/>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval/images/overview_est.png"><img height="auto"
src="http://www.eviews.com/blog/speceval/images/overview_est.png" title="Estimation Example"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Estimation Example</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval/images/overview_stability.png"><img height="auto"
src="http://www.eviews.com/blog/speceval/images/overview_stability.png" title="Coefficient Stability"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Coefficient Stability</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
Nevertheless, far more stress is put on information about forecasting performance, which is the key focus of the add-in. Correspondingly, the add-in contains several outputs that either visualize historical (backtest) forecasts<sup><a href="#fn2" id="ref2">2</a></sup>, or that provide numerical information about the precision of these forecasts. The main graph – indeed in some sense the workhorse graph of the add-in – displays all available historical forecasts together with the actuals, see figure below. Apart from listing multiple horizons, the user can also include additional series in the graph or decide to use one of four alternative transformations.<br/><br/>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval/images/overview_forecast.png"><img height="auto"
src="http://www.eviews.com/blog/speceval/images/overview_forecast.png" title="Conditional Forecasts"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Conditional Forecasts</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
The next table summarizes measures of precision of historical forecasts. The table displays the values of particular precision metrics (MAE, RMSE or bias) for alternative specifications and for multiple horizons. Crucially, this table is color-coded facilitating quick comparison across specifications.<br/><br/>
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval/images/overview_precision.png"><img height="auto"
src="http://www.eviews.com/blog/speceval/images/overview_precision.png" title="Forecast Precision"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: Forecast Precision</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Lastly, the add-in also provides detailed information about the behavior of the model under different conditions. This includes two types of exercises. The first exercise consists of creating and visualizing conditional scenario forecasts. This is useful both as a goal in itself, when scenario forecasting is an important use of the model, but more importantly also for instrumental reasons: thanks to their controlled-experiment nature, scenario forecasts can help identify problems with the model. The add-in produces several types of graphs visualizing scenario forecasts, see figure below for illustration.<br/><br/>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval/images/overview_scenarios.png"><img height="auto"
src="http://www.eviews.com/blog/speceval/images/overview_scenarios.png" title="Model Scenarios"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Model Scenarios</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
The second exercise is creating and visualizing impulse shock responses, i.e. introducing shocks to a single independent variable or regressor and studying the response of the dependent variable. This allows the modeler to assess the influence a particular independent variable/regressor has on the dependent variable, as well as the dynamic profile of responses. See figure below for illustration.<br/><br/>
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/speceval/images/overview_ir.png"><img height="auto"
src="http://www.eviews.com/blog/speceval/images/overview_ir.png" title="Impulse Responses"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: Impulse Responses</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
The above discussion makes it clear that the <b>focus here is on graphical information, rather than on numerical information</b> as is more customary in model development toolkits. This is motivated by two considerations. First, graphical information is significantly more suitable for the interactive model development process in which the modeler comes up with improvements to the current model based on information on its performance. Second, the human brain is able to process graphical information faster than numerical information; hence even when numerical information is presented, it is associated with graphical cues to increase the processing speed, such as color-coding of the estimation output.<br/><br/><br/><br/>
<h3 class="seccol", id="sec3">Flexibility in Practice</h3>
The third basic principle – flexibility – is in practice embodied in the ability of the user to adjust the processes or the outputs via add-in options. There are altogether almost 40 user settings – all listed and explained in the add-in documentation - which can be divided into several categories.<br/><br/>
First, general options focus on which of the in-built functionality is going to be performed and on which objects/specifications. Next, there is a group of options that allows customization of the outputs, such as specification of horizons for tables and/or graphs, transformations used in graphs, or additional series to be included in graphs. Third group of options allows for some basic customization of the forecasting processes. For example, one can choose between in-sample and out-of-sample forecasting, or one can specify additional equations/identities to be treated as part of the forecasting model.<sup><a href="#fn2" id="ref2">2</a></sup> These are just two examples in which the forecasting process can be customized.<br/><br/>
Final two groups focus on control of samples used in the various procedures and on customization of storage settings. The former includes for example an option to manually specify sample boundaries for the backtesting procedures, or for the conditional scenario forecasts. The latter then allows the user to determine which objects will be kept in the workfile after the execution and under what names or aliases.<br/><br/><br/><br/>
<h3 id="sec4">What's Next</h3>
Future blog posts in this series will focus on illustrating both the use of the add-in, highlighting the ease of use and flexibility, and on the outputs. Each will follow a particular application, always focusing on a particular feature(s) of the add-in. First in the series will provide overview of basics of using the add-in, highlighting the key outputs and the customization of the process and the outputs. Second in the series will then stress the ability - and power - of using transformations in model development. Third post will focus on creating unconditional forecasts, while the last post will conclude with a brief look at recursive model structures.<br/><br/><br/><br/>
<hr />
<h3 id="sec5">Footnotes</h3>
<sup id="fn1">1. Of course, comprehensiveness is more a goal rather than a state in that there will always be additional functionalities that could/should be included. See model development list on the add-in GitHub site for what additional functionality is on the roadmap, but feel free to also make suggestions there.<br />
Also, the add-in is comprehensive in terms of its focus, which is forecasting behavior of a given model – as opposed to econometric characteristics of the model. This means that currently the add-in does not include any information in the form of outputs of econometric tests.<a href="#ref1" title="Jump back to footnote 1 in the text.">↩</a></sup><br/>
<sup id="fn2">2. By historical forecasts I mean conditional forecasts, which are potentially multistep and dynamic, and/or recursive.<a href="#ref2" title="Jump back to footnote 2 in the text.">↩</a></sup><br/>
<sup id="fn3">3. Note that these two features – in-sample forecasting and inclusion of multiple equations in the forecasting model – are possible thanks to in-built EViews functionality and hard to replicate in other statistical programs. The former is thanks to the separation between estimation and forecasting samples, the latter thanks to flexible model objects.<a href="#ref3" title="Jump back to footnote 3 in the text.">↩</a></sup>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-12390203530726640622021-04-06T12:13:00.004-07:002021-04-12T16:28:06.674-07:00Time series cross-validation in ENET<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
EViews 12 has added several new enhancements to <b>ENET</b> (elastic net) such as the ability to add observation and variable weights and additional cross-validation methods.<br /><br />
In this blog post we will show one of the new methods for time series cross-validation. The demonstration will compare the forecasting performance of rolling window cross-validation with models constructed from least squares as well as a simple split of our dataset into training and test sets.<br /><br />
We will be evaluating the out-of-sample prediction abilities of this new technique on some important macroeconomic variables. The analysis will show the promising forecast performance obtained on the variables in this dataset by using a time series specific cross validation method compared with simpler methods.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Background</a>
<li><a href="#sec2">Dataset</a>
<li><a href="#sec3">Analysis</a>
<li><a href="#sec4">Files</a>
</ol><br />
<h3 class="seccol", id="sec1">Background</h3>
When performing model selection for a time series forecasting problem it is important to be aware of the temporal properties of the data. The time series may be generated by an underlying process that changes over time, resulting in data that are not independent and identically distributed (i.i.d.). For example, time series data are frequently serially correlated and the ordering of the data are important.<br /><br />
Traditional time series econometrics solves this problem by splitting the data into training and test sets, with the test set coming from the end of the dataset. While this preserves the temporal aspects of the data, not all of the information in the dataset is used because the data in the test set are not used to train the model. Any characteristics unique to the training or test dataset may negatively affect the forecast performance of the model on new data.<br /><br />
Meanwhile, other model selection procedures such as cross-validation typically assume the data to be i.i.d., but have often been applied to time series data without regard to temporal structure. For example, the very popular k-fold cross-validation is done by splitting the data into k sets, treating k-1 of them collectively as the training set, and using the remaining set as the test set. While the data within each set retain their original ordering, the test set may occur before portions of the training data. So while cross-validation makes full use of the data, it partly ignores its time ordering.<br /><br />
The two time series cross-validation methods introduced in EViews 12 combine the benefits of temporal awareness of traditional time series econometrics with the use of the entire dataset from cross-validation. More details about these procedures can be found in the <a href='https://help.eviews.com/helpintro.html#page/content%2Fenet-Estimating_an_Elastic_Net_Regression_in_EViews.html%23ww273241'>EViews documentation</a>. We have chosen to demonstrate ENET with rolling time series cross-validation, which “rolls” a window of constant length forward through the dataset, keeping the test set after the training set.<br /><br />
In order to illustrate another method in the family of elastic net shrinkage estimators we use ridge regression for this analysis. Ridge regression is another penalized estimator that is related to Lasso (more details are in <a href='http://blog.eviews.com/2021/02/lasso-variable-selection.html'>this blog post</a>). Instead of adding an L1 penalty term to the linear regression cost function as in Lasso, we add an L2 penalty term:
\begin{align*}
J = \frac{1}{2m}\xsum{i}{1}{m}{\rbrace{y_i - \beta_0 -\xsum{j}{1}{p}{x_{ij}\beta_j}}} {\color{red}{+\lambda\xsum{j}{1}{p}{\beta_j^2}}}
\end{align*}
where the regularization parameter $\lambda$ is chosen by cross-validation.<br /><br /><br /><br />
<h3 class="seccol", id="sec2">Dataset</h3>
The data for this demonstration consist of 108 monthly US macroeconomic series from January 1959 to December 2007. This was part of the <a href='http://www.princeton.edu/~mwatson/ddisk/stock_watson_generalized_shrinkage_June_2012.zip'>dataset</a> used in <a href='http://www.princeton.edu/~mwatson/papers/Stock_Watson_JBES_2012.pdf'>Stock and Watson (2012)</a> (we only use the data on “Sheet1”). Each time series is stationary transformed according to the specification in the data heading. The stationarity transformation is important to ensure that the series are identically distributed and so that the simple split into training and test data in the first part of our analysis does not produce a test set that is significantly different from our training set.
In the table below we show part of the data used for this example.<br/><br/>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/enet_tscv/images/data_overview.png"><img height="auto"
src="http://www.eviews.com/blog/enet_tscv/images/data_overview.png" title="Data Preview"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Data Preview</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
Additional information about the data can be found <a href='http://www.princeton.edu/~mwatson/papers/stock_watson_generalized_shrinkage_supplement_June_2012.pdf'>here</a>.<br/><br/><br/><br/>
<h3 class="seccol", id="sec3">Analysis</h3>
We take each series in turn as the dependent and treat the other 107 variables as independent variables for estimation and forecasting. Each regression then has 108 variables, plus one intercept. The independent variables are lagged by one observation, which is one month. The first 80% of the dataset is used to estimate the model (the "estimation sample") and the last 20% is reserved for forecasting (the "forecasting sample").<br/><br/>
Because we want to compare each model type (least squares, simple split, and rolling) on an equal basis, we have chosen to take the coefficients estimated from each model and keep them fixed over the forecast period. In addition, while it might be more interesting to use pseudo out-of-sample forecasting over the forecast period rather than fixed coefficients, rolling cross-validation is time intensive and we preferred to keep the analysis tractable.<br/><br/>
The first model is a least squares regression on each series over the estimation sample as a baseline. With the coefficients estimated from OLS we forecast over the forecast sample.<br/><br/>
Next, we use ridge regression with a simple split on the estimation sample as a comparison. (Simple Split is a new addition to ENET cross-validation in EViews 12 that divides the data into an initial training set and subsequent test set.) We then split this first 80% of the dataset further into training and test sets using the default parameters. Cross-validation chooses a set of coefficients that minimize the mean squared error (MSE). Using these coefficients we again forecast over the remaining forecast sample.<br/><br/>
Finally, we apply rolling time series cross-validation to the same split of the data for each series: the estimation sample as a training and test set for rolling cross-validation and the forecast sample for forecasting using the coefficients chosen for each series. We use the default parameters for rolling cross validation and again minimize the MSE.<br/><br/>
After generating 324 forecasts with our 108 variables and three different models, we collected the root mean squared error (RMSE) of each forecast into a table. This table is shown below.<br/><br/>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/enet_tscv/images/rmse.png"><img height="auto"
src="http://www.eviews.com/blog/enet_tscv/images/rmse.png" title="Root Mean Squared Error"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Root Mean Squared Error</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
Each row of the table has, in order, the name of the dependent in the regression and the RMSE for the least squares, simple split, and time series CV models. The minimum value in each row is highlighted in yellow. If a row contains duplicate values, then none of the cells are highlighted because we are only counting instances when one model has the lowest error measure compared with the others. At the bottom of the table is a row with the total number of times each cross-validation method had the minimum value, summed across all series. For example, OLS had the minimum RMSE 21 times, or 25% of the total, while rolling cross-validation had the minimum RMSE 38 times, for 45% of the total. Simple split makes up the remaining 31% (the percentages do not add up to one because of rounding).<br/><br/>
Below we include the equivalent table for mean absolute error (MAE). Percentages for this error measure are 20% for OLS, 31% for simple split cross-validation, and 50% for rolling cross-validation.<br/><br/>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/enet_tscv/images/mae.png"><img height="auto"
src="http://www.eviews.com/blog/enet_tscv/images/mae.png" title="Mean Absolute Error"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Mean Absolute Error</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
In the two tables above we can see some interesting highlighted clusters of series that belong in the same categories as defined in the <a href='http://www.princeton.edu/~mwatson/papers/Stock_Watson_JBES_2012.pdf'>paper</a> and <a href='http://www.princeton.edu/~mwatson/papers/stock_watson_generalized_shrinkage_supplement_June_2012.pdf'>supplemental materials</a>. For example, looking only at the "Rolling" column, the five EXR* series in group 11 are the exchange rates of four currencies with the USD as well as the effective exchange rate of the dollar. Other groups with the lowest forecast errors after using rolling cross-validation include the three CES*R series, for hourly earnings, and the FS* series, representing various measures of the S&P 500.<br /><br />
We leave further investigation of these time series, and their estimation and forecasting properties with methods that are temporally aware, to the reader.<br /><br />
<hr />
<h3 id="sec4">Files</h3>
<ol>
<li><a href="http://www.eviews.com/blog/enet_tscv/stock_watson.wf1">stock_watson.WF1</a>
<li><a href="http://www.eviews.com/blog/enet_tscv/stock_watson.prg">stock_watson.PRG</a>
</ol><br />
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-36357222750144735832021-03-03T09:56:00.002-08:002021-05-25T07:49:57.615-07:00New Variable Selection Diagnostics and Data MembersThe 2021/03/03 update to EViews 12 has two new smaller Variable Selection features. These will help you extract information on the outcome of any selection method and obtain diagnostics on the selection process for a subset of methods. <span><a name='more'></a></span><div><br /></div><div>The first new feature is a way to extract lists of the search variables that have been kept or rejected by the selection procedure. Naturally, they are the data members <span style="font-family: courier;">@varselkept</span> and <span style="font-family: courier;">@varselrejected</span>. For any Equation object (say, “<span style="font-family: courier;">EQ</span>”) that has been estimated with any of the variable selection techniques, the calls </div><div><span style="font-family: courier;"> eq.@varselkept </span></div><div><span style="font-family: courier;"> eq.@varselrejected </span></div><div>will return space-delimited lists of the variables in EQ that were kept or rejected by variable selection, not including the always included regressors. </div><div><br /></div><div>The second new feature is additions to the views for Variable Selection. For the Uni-directional, Stepwise, and Swapwise methods, there is a new Selection Diagnostics menu. The former two have six items in this menu: R-squared, t-Stats, and Alpha-squared Graphs, and corresponding Tables. Swapwise has R-squared and Alpha-squared Graphs and Tables. Each graph and table show the chosen statistic at each step in the selection process. Choosing R-squared Graph for forward stepwise selection in an example dataset displays:</div><div class="separator" style="clear: both; text-align: center;"><br /></div><br /><div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-yEsc0Mp--ME/YD_Y_aEBaAI/AAAAAAAAA9k/vqACo8HOWvoj6Qh0jdqqB9C-CDryQl6xACNcBGAsYHQ/s458/varsel1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="423" data-original-width="458" height="370" src="https://1.bp.blogspot.com/-yEsc0Mp--ME/YD_Y_aEBaAI/AAAAAAAAA9k/vqACo8HOWvoj6Qh0jdqqB9C-CDryQl6xACNcBGAsYHQ/w400-h370/varsel1.png" width="400" /></a></div><br /></div><div>showing the increase to the R-squared statistic with each step in the selection. It is interesting to see the large contributions to R-squared in just the first few steps. </div><div><br /></div><div>R-squared Table shows the same information in table form:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div><br /></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-FLK_mqnYbtA/YD_ZHiQ-uoI/AAAAAAAAA9o/Nz3AcwZGVPQRTBW-W13WnCRNBR6Mq5CowCNcBGAsYHQ/s291/varsel2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="265" data-original-width="291" height="364" src="https://1.bp.blogspot.com/-FLK_mqnYbtA/YD_ZHiQ-uoI/AAAAAAAAA9o/Nz3AcwZGVPQRTBW-W13WnCRNBR6Mq5CowCNcBGAsYHQ/w400-h364/varsel2.png" width="400" /></a></div><br />IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com2tag:blogger.com,1999:blog-6883247404678549489.post-25978748104437565982021-02-16T11:11:00.008-08:002021-04-12T16:28:23.393-07:00Lasso Variable Selection<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 0px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
In this blog post we will show how Lasso variable selection works in EViews by comparing it with a baseline least squares regression. We will be evaluating the prediction and variable selection properties of this technique on the same <a href="https://web.stanford.edu/~hastie/StatLearnSparsity_files/DATA/diabetes.html">dataset</a> used in the well-known paper “Least Angle Regression” by Efron, Hastie, Johnstone, and Tibshirani. The analysis will show the generally superior in-sample fit and out-of-sample forecast performance of Lasso variable selection compared with a baseline least squares model.
<a name='more'></a><br /><br />
Lasso variable selection, <a href="http://eviews.com/EViews12/ev12ecest_n.html#varsel">new to EViews 12</a> and also known as the Lasso-OLS hybrid, post-Lasso OLS, the relaxed Lasso (under certain conditions), or post-estimation OLS, uses Lasso as a variable selection technique followed by ordinary least squares estimation on the selected variables.<br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Background</a>
<li><a href="#sec2">Dataset</a>
<li><a href="#sec3">Analysis</a>
</ol><br />
<br /><br />
<h3 class="seccol", id="sec1">Background</h3>
In today’s data-rich environment it is useful to have methods of extracting information from complex datasets with large numbers of variables. A popular way of doing this is with dimension reduction techniques such as principal components analysis or dynamic factor models. By reducing the number of variables in a model, we can reduce overfitting, reduce the complexity of the model and make it easier to interpret, and decrease computation time. However, dimension reduction methods have the risk of losing useful information contained in variables that are not included in the reduced set, and may potentially have poorer predictive power.<br/><br/>
Lasso is useful because it is a shrinkage estimator: it shrinks the size of the coefficients of the independent variables depending on their predictive power. Some coefficients may shrink down to zero, allowing us to restrict the model to variables with nonzero coefficients.<br/><br/>
Lasso is just one method out of a family of penalized least squares estimators (other members include ridge regression and elastic net). Starting with the linear regression cost function:
\begin{align*}
J = \frac{1}{2m}\xsum{i}{1}{m}{\rbrace{y_i - \beta_0 -\xsum{j}{1}{p}{x_{ij}\beta_j}}}
\end{align*}
where $y_i$ is the dependent variable, $x_{ij}$ are the independent variables, $\beta_j$ are the coefficients, $m$ is the number of data points, and $p$ the number of independent variables, we obtain the coefficients $\beta_j$ by minimizing $J$. If the model based on linear regression is overfit and does not make good predictions on new data, then one solution is to construct a Lasso model by adding a penalty term:
\begin{align*}
J = \frac{1}{2m}\xsum{i}{1}{m}{\rbrace{y_i - \beta_0 -\xsum{j}{1}{p}{x_{ij}\beta_j}}} {\color{red}{+\lambda\xsum{j}{1}{p}{|\beta_j|}}}
\end{align*}
where the parameters are the same as before with the addition of the regularization parameter $\lambda$. By adding these extra terms the cost of $\beta_j$ is increased, so to minimize the cost function the values of $\beta_j$ have to be reduced. Smaller values of $\beta_j$ will "smooth out" the function so it fits the data less tightly, leaving it more likely to generalize well to new data. The regularization parameter $\lambda$ determines how much the cost of $\beta_j$ is increased. Lasso estimation in EViews can automatically select an appropriate value with cross-validation, which is a data-driven method of choosing $\lambda$ based on its predictive ability.<br/><br/>
If we have a dataset with many independent variables, ordinary least squares models may produce estimates with large variances and therefore unstable forecasts. By applying Lasso regression to the data and removing variables that have been shrunk to zero, then applying OLS to the reduced number of variables, we may be able to improve forecasting performance. In this way we can perform dimension reduction on our data based on the predictive accuracy of our model. <br/><br/><br/><br/>
<h3 class="seccol", id="sec2">Dataset</h3>
In the table below we show part of the data used for this example.<br/><br/>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/spreadsheet.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/spreadsheet.png" title="Data Preview"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Data Preview</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
The ten original variables are age, sex, body mass index (bmi), average blood pressure (bp), and six blood serum measurements for 442 patients. They have all been standardized as described in the paper. The dependent variable is a measure of disease progression one year after the other measurements were taken and has been scaled to have mean zero. We are interested in the accuracy of the fit and predictions from any model we develop of this data and in the relative importance of each regressor.<br/><br/><br/><br/>
<h3 class="seccol", id="sec3">Analysis</h3>
We first perform an OLS regression on the dataset to give us a baseline for comparison.<br/><br/>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/ls_all.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/ls_all.png" title="OLS Regression"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: OLS Regression</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
One thing to note in this estimation result is that the adjusted R-squared for this model is .5066, indicating that the model explains approximately 51% of the variation in the dependent variable. We see that certain variables (BMI, BP, LTG, and SEX) have both a greater impact on the progression of diabetes after one year and are the most statistically significant.<br/><br/>
Next, we run a Lasso regression over the same dataset and look at the plot of the coefficients against the L1 norm of the coefficients. This gives us a sense of how each coefficient contributes to the dependent variable. We can see that as the degree of regularization decreases (the L1 norm increases) more coefficients enter the model.<br/><br/>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/coef_evol.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/coef_evol.png" title="Coefficient Evolution"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Coefficient Evolution</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
Let’s take a closer look at the coefficients.<br /><br />
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/lasso.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/lasso.png" title="Lasso Regression"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: Lasso Regression</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
The set of coefficients at the minimum value of lambda (.004516) are all nonzero. However, when we move to the lambda value in the next column (6.401), which is the largest value of lambda that is within one standard deviation of the minimum, we see that only four of the original ten regressors are nonzero. Compared with least squares, most of the coefficients in the first column have shrunk slightly toward zero, and more so in the next column with a larger regularization penalty (with the exception of an interesting sign change for HDL). Three of the variables retained (BMI, BP, and LTG) are the same as the variables identified by least squares as being both more influential on the outcome and statistically significant. But compared to least squares, this is a less complex model.
Does reducing the number of variables in this way lead to a better fitting model? Evaluate a Lasso variable selection model with the same options and see.<br /><br />
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/lasso_vs.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/lasso_vs.png" title="Lasso Variable Selection"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Lasso Variable Selection</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
The unimpressive result of OLS applied to the variables selected from the Lasso fit is that adjusted R-squared has increased ever-so-slightly to .5068. Another thing to note is that while Lasso generally shrinks, or biases, the coefficients toward zero, OLS applied to Lasso expands, or debiases, them away from zero. This results in a decrease in the variance of the final model, as you can see by comparing the errors for the Lasso variable selection model with the first OLS model.<br /><br />
You may have noticed that the set of nonzero coefficients here is different than that for the Lasso example earlier. That’s because Lasso variable selection uses a different measure (AIC) to select the preferred model compared to Lasso. This is the same measure used for the other variable selection methods in EViews.<br /><br />
What about out-of-sample predictive power? We have randomly labeled each of the 442 observations as either training or test datapoints (the split is 70% training, 30% test). After doing least squares and Lasso variable selection on the training data, we use Series->View->Forecast Evaluation to compare the forecasts for least squares and Lasso variable selection over the test set:<br /><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/fcomp_orig.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/fcomp_orig.png" title="Lasso Predictive Evaluation"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: Lasso Predictive Evaluation</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
We have achieved very slightly better predictive performance for some measures (MAE, MAPE) and very slightly worse for others (RMSE, SMAPE).<br /><br />
This is all mildly interesting. But the real power of variable selection techniques comes when you have a larger dataset and want to reduce the set of variables under consideration to a more manageable set. To this end, we use the “extended” dataset provided by the authors that includes the ten original variables plus squares of nine variables and forty-five interaction terms, for a total of sixty-four variables.<br /><br />
First, we repeat the OLS regression from earlier with the new extended dataset:<br /><br />
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/ls_extended.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/ls_extended.png" title="Extended OLS"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7: Extended OLS</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
Adjusted R-squared is actually higher than it was for the original ten variables, at .5233, so the additional variables have added some explanatory power to the model.<br /><br />
Next, let’s go straight to Lasso variable selection on the extended dataset.<br /><br />
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/lassovs_all.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/lassovs_all.png" title="Extended Lasso Variable Selection"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: Extended Lasso Variable Selection</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
Out of sixty-four original search variables, the selection procedure has kept fourteen. This is a significant reduction in complexity. The adjusted R-squared has increased from .5233 to .5308, and the standard error of the regression has decreased.<br /><br />
The in-sample R-squared and errors have moved in a modest but promising direction. What about out-of-sample prediction? We again compare the forecasts for least squares and Lasso variable selection over the test set:<br /><br />
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/lassosel/images/fcomp_ext.png"><img height="auto"
src="http://www.eviews.com/blog/lassosel/images/fcomp_ext.png" title="Extended Lasso Predictive Evaluation"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9: Extended Lasso Predictive Evaluation</small><br/>
<small>(Click to enlarge)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
Now we can see a meaningful improvement in forecasting performance. All of the error measures have improved, some significantly. Applying Lasso variable selection to this larger dataset has led to reduced model complexity, a slight improvement in the in-sample fit, and improved forecasting performance over least squares.<br /><br /><br /><br />
<h3>Request a Demonstration</h3>
If you would like to experience Lasso methods in EViews for yourself, you can request a demonstration copy <a href="http://www.eviews.com/demo">here</a>.
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com3tag:blogger.com,1999:blog-6883247404678549489.post-6688321458965513982021-02-02T10:51:00.001-08:002021-02-02T10:51:19.453-08:00Univariate GARCH Models with Skewed Student’s-t Errors<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 0px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Eren Ocakverdi</i><br /><br />
This blog piece intends to introduce a new add-in (i.e. <b>SKEWEDUGARCH</b>) that extends the current capability of EViews’ available features for the estimation of univariate GARCH models.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Skewed Student’s-t Distribution </a>
<li><a href="#sec3">Application to USDTRY currency </a>
<li><a href="#sec4">Files</a>
<li><a href="#sec5">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction</h3>
Volatility is an important concept in itself, but it has a special place in finance as it is usually associated with risk. Although investors believe in higher risk higher reward, it is not an easy task to exploit this trade-off. Price of an asset can change dramatically over a short period of time and in either direction, which makes it exceedingly difficult to predict. Volatility is responsible from such sharp movements, so it is important to develop a gauge to measure and identify its dynamics.<br/><br/>
One of the critical observations regarding the returns of financial assets was that the volatilities were not fixed over time and tended to cluster around large changes. GARCH models are specifically designed to capture this behavior and describe the movement of volatility more accurately. Details of GARCH estimation in EViews can be found <a href='http://www.eviews.com/help/helpintro.html#page/content%2Farch-ARCH_and_GARCH_Estimation.html%23'>here</a>.<br/><br/>
Conditional distribution of error terms of returns (i.e. mean equation) plays an important role in the estimation of GARCH-type models. Currently, EViews offers <a href='http://www.eviews.com/help/helpintro.html#page/content%2Farch-Basic_ARCH_Specifications.html%23ww165096'>three different assumptions</a> regarding the specification of this distribution.<br/><br/><br/><br/>
<h3 class="seccol", id="sec2">Skewed Student’s-t Distribution</h3>
Consistent with the stylized facts of financial markets, distribution of returns has fat tails (i.e. high kurtosis) and are not symmetrical (i.e. positively skewed). Although Student’s-t and GED specifications can account for the excess kurtosis, they are symmetrical densities by design. Lambert and Laurent (2001) suggest the use of a skewed Student’s-t density within the GARCH framework. The log likelihood contributions of a standardized skewed Student’s-t are as follows:<br /><br />
\begin{align*}
l_t &= -\frac{1}{2} \log \rbrace{ \frac{\pi(\nu - 2) \Gamma \rbrace{\frac{\nu}{2}}^2}{\Gamma \rbrace{\frac{\nu + 1}{2}} } } + \log \rbrace{\frac{2}{\xi + \frac{1}{\xi}}} + \log(s)\\
&-\frac{1}{2}\log(\sigma^2_t) - \frac{\nu + 1}{2} \log \rbrace{1 + \frac{\rbrace{s\rbrace{y_t - X_t^\top \theta} + m}^2}{\sigma_t^2\rbrace{\nu - 2}}\xi^{-2I_t}}
\end{align*}
Here, $\xi$ is the asymmetry parameter and $\nu$ is the degrees-of-freedom of the distribution. Other parameters, $m,s$ and $I_t$ are given by:
\begin{align*}
m &= \frac{\Gamma \rbrace{\frac{\nu - 1}{2}} \sqrt{\nu - 2}}{\sqrt{\pi}\Gamma\rbrace{\frac{\nu}{2}}}\rbrace{\xi - \frac{1}{\xi}}\\
s &= \sqrt{\rbrace{\xi^2 + \frac{1}{\xi^2} - 1} - m^2}\\
I_t &= \begin{cases}
\phantom{-}1 \quad \text{if} \quad \rbrace{\frac{y_t - X_t^\top \theta}{\sigma_t}} \geq - \frac{m}{s}\\
-1 \quad \phantom{\text{if}}\text{otherwise}
\end{cases}
\end{align*}
For a symmetrical distribution, $ξ=1$, but since the add-in estimates the logarithmic transformation of the parameter, you should consider $\log(\xi)=0$ for testing the null hypothesis of symmetry.<br /><br />
Below is the comparison of theoretical distribution of Student’s-t and its (positively) skewed version. Skewness increases the chance of observing extreme values, which has important implications in finance.<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/skewedtdist.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/skewedtdist.png" title="Skewed t-Distribution"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Skewed t-Distribution</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<h3 class="seccol", id="sec3">Application to USDTRY currency</h3>
FX markets are convenient places for studying the dynamics of volatility and Turkish Lira has recently come to the fore among emerging markets due to sudden capital outflows as well as currency shocks (<b>USDTRY.WF1</b>).<br /><br />
A simple visual inspection of squared returns shows us the magnitude of the shock that hit the markets on August 10th, 2018 (<b>SKEWEDUGARCH_EXAMPLE.PRG</b>). The impact was so severe that it dwarfed all other volatilities experienced during the analysis period of 2005-2020.<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/returnssq.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/returnssq.png" title="Squared Returns"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Squared Returns</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
In order to estimate the conditional variance of returns, we start by fitting two alternative models (i.e. GARCH(1,1) and TGARCH(1,1)) with two different distributional assumptions (i.e. Normal and Student’s-t). Mean equation is same for all models:
\begin{align*}
r_t &= \bar{r} + e_t\\
e_t &= \epsilon_t \sigma_t
\end{align*}
\begin{align*}
\textbf{Model 1}: \quad \sigma_t^2 &= \omega + \alpha_1 e_{t-1}^2 + \beta_1\sigma_{t-1}^2, \quad \text{where} \quad \epsilon_t \sim N(0,1)\\
\textbf{Model 2}: \quad \sigma_t^2 &= \omega + \alpha_1 e_{t-1}^2 + \beta_1\sigma_{t-1}^2 + \gamma_1 e_{t-1}^2(e_t < 0), \quad \text{where} \quad \epsilon_t \sim N(0,1)\\
\textbf{Model 3}: \quad \sigma_t^2 &= \omega + \alpha_1 e_{t-1}^2 + \beta_1\sigma_{t-1}^2, \quad \text{where} \quad \epsilon_t \sim \text{Student}(0,1,\nu)\\
\textbf{Model 2}: \quad \sigma_t^2 &= \omega + \alpha_1 e_{t-1}^2 + \beta_1\sigma_{t-1}^2 + \gamma_1 e_{t-1}^2(e_t < 0), \quad \text{where} \quad \epsilon_t \sim \text{Student}(0,1,\nu)
\end{align*}
<!-- :::::::::: FIGURES 3a and 3b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 3a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/model1.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/model1.png" title="Model 1 Results"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 3b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/model2.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/model2.png" title="Model 2 Results"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3a: Model 1 Results</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 3b: Model 2 Results</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 3a and 3b :::::::::: -->
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 4a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/model3.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/model3.png" title="Model 3 Results"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 4b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/model4.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/model4.png" title="Model 4 Results"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4a: Model 3 Results</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 4b: Model 4 Results</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
From a purely statistical point of view ($p$-values and information criteria that is), fat tails and/or leverage effects better represent the Turkish Lira’s volatility dynamics. Distribution fit to standardized residuals and the analysis of news impact can be provided as supporting evidence in that respect.<br /><br />
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 5a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/leverage.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/leverage.png" title="Leverage"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 5b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/nic.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/nic.png" title="News Impact Curve"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5a: Leverage</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 5b: News Impact Curve</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
Extreme events seem to occur more often than suggested by the normal distribution and the volatility response to these shocks are more severe in the case of depreciation than that of appreciation.<br /><br />
At this point, one may also wonder if there is any long memory effect in the volatility of returns. In order to do so, we first estimate an ARFIMA model for the squared return series and a simple FIGARCH model for the variance part of regular return series:
\begin{align*}
&\textbf{Fractional Mean Model}: \quad \rbrace{1 - L}^d(r_t^2 - \mu) = e_t, \quad \text{where} \quad e_t \sim N(0,\bar{\sigma})\\
&\textbf{Fractional Variance Model}: \quad \sigma_t^2 = \omega + \rbrace{1 - \beta_1 - \rbrace{1 - \alpha_1}\rbrace{1 - L}^d}e_{t-1}^2 + \beta_1\sigma_{t-1}^2, \quad \text{where} \quad \epsilon_t \sim \text{Student}(0,1,\nu)
\end{align*}
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 6a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/model5.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/model5.png" title="Fractional Mean Model"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 6b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/model6.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/model6.png" title="Fractional Variance Model"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6a: Fractional Mean Model</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 6b: Fractional Variance Model</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
Fractional difference parameter is significantly different from 0 and 1 in both models, but it is also significantly smaller than 0.5 in the ARFIMA model suggesting that the squared return series has long memory properties. However, modelling the variance of the return series explicitly we have successfully explained the behaviour of volatility and mitigated the impact of (and need for) long memory.<br /><br />
Since the estimation of fractional difference parameter can be sensitive to the choice of truncation limits, it may not worth the effort unless the statistical properties of results from FIGARCH models are significantly better than that of rival GARCH models. Here, our previous TGARCH(1,1) model with Student’s-t errors is still the frontrunner in that respect.<br /><br />
What if the positive shocks (i.e. depreciation) happen less frequently but more severe than negative shocks (i.e. appreciation) implied by a symmetric distribution? In order to test this hypothesis, one needs to look for asymmetry towards larger positive extreme values. We can estimate our final model via add-in assuming a skewed Student’s-t distribution and see if we can further improve the fit.<br /><br />
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/skewedgarch.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/skewedgarch.png" title="Skewed GARCH Estimates"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7: Skewed GARCH Estimates</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
Estimated parameter values change slightly vis-à-vis our original TGARCH model, but the asymmetry parameter seems to be positive and significant, supporting the evidence of skewness. Information criteria favors this version of the model over all other specifications above.<br /><br />
One of the main uses of GARCH models in financial institutions is the estimation of Value-at-Risk (VaR), a concept that tracks and calculates the potential loss that might happen during a trading activity of any sort. Commonly used symmetric error distributions for the purpose might lead to underestimation of right tail risk (i.e. in short trading positions). The chart below compares the daily VaR estimations from commonly used distributions and depicts effects of fat tails and skewness for a long position in TL (or a short position in USDTRY).<br /><br />
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/skewedugarch/images/valueatrisk.png"><img height="auto"
src="http://www.eviews.com/blog/skewedugarch/images/valueatrisk.png" title="Value-at-Risk"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: Value-at-Risk</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
At its peak around the summer of 2018, currency shock led 99% VaR threshold of a TL-denominated asset or portfolio to jump to a daily loss of 14.5%. It would have been considered as an astronomical event a year ago, since it was only around 1% back then. Increasing the likelihood of extreme events and incorporating the asymmetric tail behaviour of the shocks, would further add 5.1 and 3.5 bps, respectively and would carry this limit to 23.1%!<br /><br /><br /><br />
<hr />
<h3 class="seccol", id="sec4">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/skewedugarch/workfiles/usdtry.wf1"'><b class="wf">USDTRY.WF1</b></a></li>
<li><a href="http://www.eviews.com/blog/skewedugarch/workfiles/skewedugarch_example.prg"'><b class="wf">SKEWEDUGARCH_EXAMPLE.PRG</b></a></li>
</ul><br /><br />
<hr />
<h3 class="seccol", id="sec5">References</h3>
<ol class="bib2xhtml">
<li id="lambert-laurent-2001">
Lambert P and Laurent S (2001), <i>"Modelling Financial Time Series Using GARCH-Type Models and a Skewed Student Density"</i>, Mimeo, Universite de Liege.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com1tag:blogger.com,1999:blog-6883247404678549489.post-85782422361418923402021-01-20T09:28:00.003-08:002021-01-20T09:35:09.379-08:00Automatic Factor Selection: Working with FRED-MD Data<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 0px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: ['{\\left(}'],
rb: ['{\\right)}'],
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}'],
min: ['{\\operatorname\{min\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
This is the first of two posts devoted to automatic factor selection and panel unit root tests with cross-sectional dependence. Both features were recently released with EViews 12. Here, we summarize and work with two seminal contributions to automatic factor selection by Bai and Ng (2002) and Ahn and Horenstein (2013).
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Overview of Automatic Factor Selection</a>
<ul>
<li><a href="#sec2.1">Bai and Ng (2002)</a>
<li><a href="#sec2.2">Ahn and Horenstein (2013)</a>
</ul>
<li><a href="#sec3">Working with FRED-MD</a>
<ul>
<li><a href="#sec3.1">Factor Selection using Bai and Ng (2002)</a>
<li><a href="#sec3.2">Factor Selection using Ahn and Horenstein (2013)</a>
<li><a href="#sec3.3">Factor Model Estimation</a>
<li><a href="#sec3.4">Forecasting Industrial Production</a>
</ul>
<li><a href="#sec4">Files</a>
<li><a href="#sec5">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction</h3>
Recent trends in empirical economics (particularly those in macroeconomics) indicate increased use and demand for large dimensional datasets. Since the temporal dimension ($T$) is typically thought to be large anyway, the term <b>large dimensional</b> here refers to the number of variables ($N$), otherwise referred to as <b>factors</b> or <b>cross-sectional</b> units. This is in contrast with traditional paradigms where the number of variables is few in number, but the temporal dimension is long. This paradigm shift is markedly the result of theoretical advancements in <b>dimension-aware</b> techniques such as factor-augmented and panel models.<br /><br />
At the heart of all dimension-aware methods is <b>factor selection</b>, or the correct specification (estimation) of the number of factors. Traditionally, this parameter was often assumed. Recently, however, several contributions have offered data driven (semi-)autonomous factor selection methods, most notably those of Bai and Ng (2002) and Ahn and Horenstein (2013).<br /><br />
These automatic factor selection techniques have come to play important roles in factor augmented (vector auto)regressions, panel unit root tests with cross sectional dependence, and data manipulation. A particularly important example of the latter is <a href='https://research.stlouisfed.org/econ/mccracken/fred-databases/'><b>FRED-MD</b></a> - a regularly updated and freely distributed macroeconomic database designed for the empirical analysis of <i>big data</i>. What is notable here is that the dataset is leveraged by collecting a vast number of important macroeconomic variables (factors) which are then optimally reduced in dimensionality using the Bai and Ng (2002) factor selection procedure.<br /><br />
In this post, we will demonstrate how to perform this dimensionality reduction using EViews' native Bai and Ng (2002) and Ahn and Horenstein (2013) factor selection procedures. The latter were introduced with the release of EViews 12. In particular, we will download the raw FRED-MD data, transform each series according to the FRED-MD instructions, and then proceed to perform dimensionality reduction. We will next estimate a traditional factor model with the optimally selected factors, and then proceed to forecast industrial production.<br /><br />
We pause briefly in the next section to provide a quick overview of the aforementioned factor selection procedures.
<br /><br /><br /><br />
<h3 class="seccol", id="sec2">Overview of Automatic Factor Selection</h3>
Recall that the maximum number of factors cannot exceed the number of observable variables. factor selection is often used as a <b>dimension reduction</b> technique. In other words, the goal is always to optimally select the smallest number of the most representative or <b>principal</b> variables in a set. Since dimensional principality (or importance) is typically quantified in terms of <b>eigenvalues</b>, virtually all dimension reduction techniques in this literature go through <b>principal component analysis</b> (PCA). For detailed theoretical and empirical discussions of PCA, please refer to our blog entries: <a href='http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html'>Principal Component Analysis: Part I (Theory)</a> and <a href='http://blog.eviews.com/2018/11/principal-component-analysis-part-ii.html'>Principal Component Analysis: Part II (Practice)</a>.<br /><br />
Although PCA can identify which dimensions are most principal in a set, it is not designed to offer guidance on how many dimensions to retain. As a result, traditionally, this parameter was often assumed rather than driven by the data. To address this inadequacy, Bai and Ng (2002) proposed to cast the problem of factor selection as a model selection problem whereas Ahn and Horenstein (2013) achieve automatic factor selection by maximizing over ratios of two adjacent eigenvalues. In either case, optimal factor selection is data driven.<br /><br />
<h4 class="subseccol", id="sec2.1">Bai and Ng (2002)</h4>
Bai and Ng (2002) handle the problem of optimal factor selection as the more familiar model selection problem. In particular, criteria are judged as a tradeoff between goodness of fit and parsimony. To formalize matters, consider the traditional factor augmented model:
$$ Y_{i,t} = \mathbf{\lambda}_{i}^{\top} \mathbf{F}_{t} + e_{i,t} $$
where $ \mathbf{F}_{t} $ is a vector of $ r $ <b>common factors</b>, $ \mathbf{\lambda}_{i} $ denotes a vector of <b>factor loadings</b>, and $ e_{i,t} $ is the <b>idiosyncratic component</b> which is cross-sectionally independent provided $ \mathbf{F}_{t} $ accounts for all inter-cross-sectional correlations. When $ e_{i,t} $ are not cross-sectionally independent, the factor model governing $ u_{i,t} $ is said to be <i>approximate</i>.<br /><br />
The objective here is to identify the optimal number of factors. In particular, $ \mathbf{\lambda}_{i}$ and $ \mathbf{F}_{t} $ are estimated through th optimization problem:
\begin{align}
\min_{\mathbf{\Lambda}, \mathbf{F}}\frac{1}{NT} \xsum{i}{1}{N}{\xsum{t}{1}{T}{\rbrace{ Y_{i,t} - \mathbf{\lambda}_{i}^{\top}\mathbf{F}_{t} }^{2}}} \label{eq1}
\end{align}
subject to the normalization $ \frac{1}{T}\mathbf{F}^{\top}\mathbf{F} = \mathbf{I} $ where $ \mathbf{I} $ is the identity matrix.<br /><br />
Traditionally, the estimated factors $\widehat{\mathbf{F}}_{t}$ are proportional to the $T \times \min(N,T)$ matrix of eigenvectors associated with all eigenvalues of the $T\times T$ matrix $\mathbf{Y}\mathbf{Y}^{\top}$. This generates the full set of $ \min(N,T) $ factors. The objective then is to choose $ r < \min(N,T) $ factors that best capture the variation in $ \mathbf{Y} $.<br /><br />
Since the minimization problem in \eqref{eq1} is linear, once the factor matrix is estimated (observed), estimation of the factor loadings reduces to an ordinary least squares problem for a given set of regressors (factors). In particular, let $ \mathbf{F}^{r} $ denote the factors associated with the $ k $ largest eigenvalues of $ \mathbf{Y}\mathbf{Y}^{\top} $, and let $ \mathbf{\lambda}_{i}^{r} $ denote the associated factor loadings. Then, the problem of estimating $ \mathbf{\lambda}_{i}^{r} $ is cast as:
$$ V \rbrace{ r, \widehat{\mathbf{F}}^{r} } = \min_{\mathbf{\Lambda}}\frac{1}{NT} \xsum{i}{1}{N}{\xsum{t}{1}{T}{\rbrace{ Y_{i,t} - \mathbf{\lambda}_{i}^{r^{\top}}\widehat{\mathbf{F}}_{t}^{r} }^{2}}} $$
Since a model with $ r+1 $ factors can fit no worse than a model with $ r $ factors, although efficiency is a decreasing function of the number of regressors, the problem of optimally selecting $ r $ becomes a classical problem of model selection. Furthermore, observe that $ V \rbrace{ r, \mathbf{F}^{r} } $ is the sum of squared residuals from a regression of $ \mathbf{Y_{i}} $ on the $ r $ factors, for all $ i $. Thus, to determine $ r $ optimally, one can use a loss function $ L_{r} $ of the form
$$ V \rbrace{ r, \widehat{\mathbf{F}}^{r} } + rg(N,T) $$
where $ g(N,T) $ is a penalty for overfitting. \cite{bai-2002} propose 6 such loss functions, labeled PC 1 through 3 and IC 1 through 3. loss functions that yield consistent estimates:
The optimal number of factors now derives as the minimum of $V \rbrace{ r, \widehat{\mathbf{F}}^{r} }$ across $ r \leq r_{\text{max}} < \min(N,T) $, where $r_{\text{max}}$ is some known number of maximum factors under consideration. In other words:
$$ r^{\star} \equiv \min_{1 \leq k \leq r_{max}} V \rbrace{ r, \widehat{\mathbf{F}}^{r} } $$
Note that since $r_{\text{max}}$ must be specified <i>a priori</i>, its choice will play a role in optimization.
<br /><br />
<h4 class="subseccol", id="sec2.2">Ahn and Horenstein (2013)</h4>
In contrast to Bai and Ng (2002), Ahn and Horenstein (2013) exploit the fact that the $ r $ largest eigenvalues of some matrix grow unboundedly as the rank of said matrix increases, whereas the other eigenvalues remain bounded. The optimization strategy is then simply the maximum of the ratio of two adjacent eigenvalues. One of the advantages of this contribution is that it's far less sensitive to the choice $ r_{\text{max}} $ than Bai and Ng (2002). Furthermore, the procedure is significantly easier to compute, requiring only eigenvalues.<br /><br />
To further the discussion, let $ \psi_{r} $ denote the $ r^{\text{th}} $ largest eigenvalue of some positive semi-definite matrix $ \mathbf{Q} \equiv \mathbf{Y}\mathbf{Y}^{\top} $ or $ \mathbf{Q} \equiv \mathbf{Y}^{\top}\mathbf{Y} $. Furthermore, define:
$$ \tilde{\mu}_{NT,\, r} \equiv \frac{1}{NT}\psi_{r} $$
Ahn and Horenstein (2013) propose the following tow estimators factors. For some $ 1 \leq r_{max} < \min(N,T) $, the optimal number of factors, $ r^{\star} $ is derived as:
<ul>
<li><b>Eigenvalue Ratio</b> (ER)
$$ r^{\star} \equiv \displaystyle \max_{r \leq r_{max}} ER(k) \equiv \frac{\tilde{\mu}_{NT,\, r}}{\tilde{\mu}_{NT,\, r + 1}} $$
</li>
<li><b>Growth Ratio</b> (ER)
$$ r^{\star} \equiv \displaystyle \max_{r \leq r_{max}} ER(k) \equiv \frac{\log \rbrace{ 1 + \widehat{\mu}_{NT,\, r} }}{\log \rbrace{ 1 + \widehat{\mu}_{NT,\, r + 1} }} $$
where
$$ \widehat{\mu}_{NT,\, r} \equiv \frac{\tilde{\mu}_{NT,\, r}}{\displaystyle \xsum{k}{r+1}{\min(N,T)}{\tilde{\mu}_{NT,\, k}}} $$
</li>
</ul>
At last, we note that Ahn and Horenstein (2013) suggest demeaning the data both in the time dimension as well as the cross-section dimension. While not absolutely necessary for consistency, this step is extremely useful in case of small samples.<br /><br /><br /><br />
<h3 class="seccol", id="sec1">Working with FRED-MD Data</h3>
The FRED-MD data a large dimensional dataset updated in real-time and publicly distributed by the Federal Reserve Bank of St. Louis. In its raw form, it consists of 128 time series either in quarterly or monthly frequency. Here, we will work with the monthly frequency which can be downloaded in its raw flavour from <a href='https://s3.amazonaws.com/files.fred.stlouisfed.org/fred-md/monthly/current.csv'><b>current.csv</b></a>. Furthermore, associated with the raw dataset is a set of instructions on how to process each variable in the dataset for empirical work. This can be obtained from <a href='https://s3.amazonaws.com/files.fred.stlouisfed.org/fred-md/Appendix_Tables_Update.pdf'><b>Appendix_Tables_Update.pdf</b></a>.<br /><br />
As a first step, we will write a brief EViews program to download the raw dataset and process each variable according to the aforementioned instructions. The latter is summarized below:
<pre><code>
<span style="color: green;">'documentation on the data:</span>
<span style="color: green;">'https://s3.amazonaws.com/files.fred.stlouisfed.org/fred-md/Appendix_Tables_Update.pdf</span>
close @wf
<span style="color: green;">'get the latest data (monthly only):</span>
wfopen https://s3.amazonaws.com/files.fred.stlouisfed.org/fred-md/monthly/current.csv colhead=2 namepos=firstatt
pagecontract if sasdate<>na
pagestruct @date(sasdate)
<span style="color: green;">'perform transformations</span>
%serlist = @wlookup("*", "series")
<span style="color: blue;">for</span> %j {%serlist}
%tform = {%j}.@attr("Transform:")
<span style="color: blue;">if</span> @len(%tform) <span style="color: blue;">then</span>
<span style="color: blue;">if</span> %tform="1" <span style="color: blue;">then</span>
series temp = {%j} 'no transform
<span style="color: blue;">endif</span>
<span style="color: blue;">if</span> %tform="2" <span style="color: blue;">then</span>
series temp = d({%j}) 'first difference
<span style="color: blue;">endif</span>
<span style="color: blue;">if</span> %tform="2" <span style="color: blue;">then</span>
%tform="3" <span style="color: blue;">then</span>
series temp = d({%j},2) 'second difference
<span style="color: blue;">endif</span>
<span style="color: blue;">if</span> %tform="2" <span style="color: blue;">then</span>
%tform="4" <span style="color: blue;">then</span>
series temp = log({%j}) 'log
<span style="color: blue;">endif</span>
<span style="color: blue;">if</span> %tform="2" <span style="color: blue;">then</span>
%tform= "5" <span style="color: blue;">then</span>
series temp = dlog({%j}) 'log difference
<span style="color: blue;">endif</span>
<span style="color: blue;">if</span> %tform="2" <span style="color: blue;">then</span>
%tform= "6" <span style="color: blue;">then</span>
series temp = dlog({%j},2) 'log second difference
<span style="color: blue;">endif</span>
<span style="color: blue;">if</span> %tform="2" <span style="color: blue;">then</span>
%tform= "7" <span style="color: blue;">then</span>
series temp = d({%j}/{%j}(-1) -1) 'whatever
<span style="color: blue;">endif</span>
{%j} = temp
{%j}.clearhistory
d temp
<span style="color: blue;">endif</span>
<span style="color: blue;">next</span>
<span style="color: green;">'drop </span>
group grp *
grp.drop resid
grp.drop sasdate
smpl 1960:03 @last
</code></pre>
This program processes and collects the variables in a group which we've labeled here <b class="wfobj">GRP</b>. Additionally, we've dropped the variable <b class="wfobj">SASDATE</b> from this group since it is a date variable. In other words, <b class="wfobj">GRP</b> is a collection of 127 variables. Furthermore, as suggested by the FRED-MD paper, the sample under consideration should start from March 1960, and so the final line of the code above sets that sample.<br /><br />
A brief glance at the variables indicates that certain variables have missing values. Unfortunately, neither the Bai and Ng (2002) nor the Ahn and Horenstein (2013) procedure handle missing values particularly well. Accordingly, as suggested in the original FRED-MD paper, missing values are initially set to the mean of non-missing observations for any given series. This is easily achieved with a quick program as follows:
<pre><code>
<span style="color: green;">'impute missing values with mean of non-missing observations</span>
<span style="color: blue;">for</span> !k=1 to grp.count
<span style="color: green;">'compute mean of non-missing observations</span>
series tmp = grp(!k)
!mu = @mean(tmp)
<span style="color: green;">'set missing observations to mean</span>
grp(!k) = @nan(grp(!k), !mu)
<span style="color: green;">'clean up before next series</span>
smpl 1960:03 @last
d tmp
<span style="color: blue;">next</span>
</code></pre>
The original FRED-MD paper next suggests a second stage updating of missing observations. Nevertheless, for sake of simplicity, we will skip this step and proceed to estimating the optimal number of factors.<br /><br />
Although we will later estimate a factor model which will handle factor selection within its scope, here we demonstrate automatic factor selection as a standalone exercise. To do so, we will proceed through the principal component dialog. In particular, we open the group <b class="wfobj">GRP</b>, and then proceed to click on <b>View/Principal Components...</b>.<br /><br />
Notice that the principal components dialog here is changed from previous versions. This is to allow for the additional selection procedures we've introduced in EViews 12. Because of these changes, we briefly pause to explain the options available to users. In particular, the method dropdown offers several factor selection procedures. The first two, <b>Bai and Ng</b> and <b>Ahn and Horenstein</b>, are automatic selection procedures. The remaining two, <b>Simple</b> and <b>User</b>, are legacy principal component methods that were available in EViews versions prior to 12.<br /><br />
Next, associated with each method is a criteria to use in selection. In case of Bai and Ng, this offers seven possibilities. One for each of the 6 criteria, and the default <b>Average of criteria</b> which provides a summary of each of the 6 criteria, as well as their average.<br /><br />
Also, associated with each method is a dropdown which determines how the maximum number of factors are determined. Here EViews offers 5 possibilities, the specifics of which can be obtained by referring to the <a href='http://www.eviews.com/help/helpintro.html#page/content/groups-Principal_Components.html'><b>EViews manual</b></a>. Recall that both the Bai and Ng (2002) as well as the Ahn and Horenstein (2013) methods require the specification of this parameter. Although EViews offers several automatic selection mechanisms, in keeping with the suggestions in the FRED-MD paper, exercises below will use a user-defined value of 8.<br /><br />
Finally, EViews offers the option of demeaning and standardizing the dataset across both time and factor dimension. In fact, since the FRED-MD paper suggests that data should be demeaned and standardized, exercises below will proceed by demeaning and standardizing each of the variables. We next demonstrate how to obtain the Bai and Ng (2002) estimate of the optimal number of factors.<br /><br />
<h4 class="subseccol", id="sec3.1">Factor Selection using Bai and Ng (2002)</h4>
From the open principal component dialog, we proceed as follows:<br /><br />
<ol>
<li>Change the <b>Method</b> dropdown to <b>Bai and Ng</b>.</li>
<li>Set the <b>User maximum factors</b> to <b>8</b>.</li>
<li>Check the <b>Time-demean</b> box.</li>
<li>Check the <b>Time-standardize</b> box.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/pca_dialog.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/pca_dialog.png" title="Principal Components Dialog"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Principal Components Dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
Hitting OK, Eviews produces a spool output. The first part of this output is a summary of the principal component analysis.
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 2a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/pca_bn1.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/pca_bn1.png" title="Bai and Ng Summary: PCA Results"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 2b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/pca_bn2.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/pca_bn2.png" title="Bai and Ng Summary: Factor Selection Results"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2a: Bai and Ng Summary: PCA Results</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 2b: Bai and Ng Summary: Factor Selection Results</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
The second part of the output, <b>Component Selection Results</b>, displays the summary of the Bai and Ng factor selection procedure. In particular, we see that each of the 6 selection criteria selected 8 factors. Naturally, the average number of selected factors is also 8. This result corresponds to the findings in the original FRED-MD paper, although the latter insists on using the PCP2 criterion. Accordingly, we can repeat the exercise above and show the specifics of the PCP2 selection. To do so, from the open group window, we again click on <b>View/Principal Components...</b>, and proceed as follows:
<ol>
<li>Change the <b>Method</b> dropdown to <b>Bai and Ng</b>.</li>
<li>Change the <b>Criterion</b> dropdown to <b>PCP2</b>.</li>
<li>Set the <b>User maximum factors</b> to <b>8</b>.</li>
<li>Check the <b>Time-demean</b> box.</li>
<li>Check the <b>Time-standardize</b> box.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/pca_bn3.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/pca_bn3.png" title="Bai and Ng PCP2: Factor Selection Results"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Bai and Ng PCP2: Factor Selection Results</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
The output above is a detailed look at the selection procedure. In particular, for each number of factors from 1 to 8, EViews displays the PCP2 statistic. Clearly, the minimum is achieved with 8 factors where the statistic equals 0.904325. Again, the number of factors selected matches that obtained in the FRED-MD paper.<br /><br />
<h4 class="subseccol", id="sec3.1">Factor Selection using Ahn and Horenstein (2013)</h4>
Similar steps can be undertaken to obtain the Ahn and Horenstein (2013) factor selection results. From the open principal component dialog, we proceed as follows:<br /><br />
<ol>
<li>Change the <b>Method</b> dropdown to <b>Ahn and Horenstein</b>.</li>
<li>Set the <b>User maximum factors</b> to <b>8</b>.</li>
<li>Check the <b>Time-demean</b> box.</li>
<li>Check the <b>Time-standardize</b> box.</li>
<li>Check the <b>Cross-demean</b> box.</li>
<li>Check the <b>Cross-standardize</b> box.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 4a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/pca_ah1.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/pca_ah1.png" title="Ahn and Horenstein Summary: PCA Results"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 4b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/pca_ah2.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/pca_ah2.png" title="Ahn and Horenstein: Factor Selection Results"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4a: Ahn and Horenstein: PCA Results</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 4b: Ahn and Horenstein: Factor Selection Results</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
The results of the Ahn and Horenstein (2013) procedure are markedly different. Unlike the preceding Bai and Ng exercises, here we have chosen to demean the factor (cross-sectional) dimension in addition to demeaning and standardizing the time dimension. This is in keeping with the suggestion in Ahn and Horenstein (2013) who suggest that the cross-sectional dimension should be demeaned to achieve superior results. In particular, the optimal number of factors selected is 1 using both the Eigenvalue Ratio and the Growth Ratio statistics. Clearly, this is very different from the 8 selected factors in the previous exercises.<br /><br />
<h4 class="subseccol", id="sec3.3">Factor Model Estimation</h4>
Typically, the objective of factor selection mechanisms is not in finding the number of factors outside of some context. Rather, it's a precursor to some form of estimation such factor model or second generation panel unit root tests. Here, we estimate a factor model using the full FRED-MD dataset and specify that the number of factors should be selected with the Bai and Ng (2002) procedure.<br /><br />
We start by creating a factor object. This is easily done by issuing the following command:
<pre><code>
factor fact
</code></pre>
This will create a factor object in the workfile called <b class="wfobj">FACT</b>. We double click it to open it and then proceed to click on the <b>Estimate</b> button to bring up the estimation dialog.
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 5a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/fact_dialog1.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/fact_dialog1.png" title="Factor Dialog: Data Tab"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 5b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/fact_dialog2.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/fact_dialog2.png" title="Factor Dialog: Estimation Tab"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5a: Factor Dialog: Data Tab</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 5b: Factor Dialog: Estimation Tab</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
The rest of the steps proceed as follows:
<ol>
<li>Under the <b>Data</b> tab, enter <b class="wfobj">GRP</b>.</li>
<li>Click on the <b>Estimation</b> tab.</li>
<li>From the <b>Number of factors</b> group, set the <b>Method</b> dropdown to <b>Bai and Ng</b>.</li>
<li>From the <b>Max. Factors</b> dropdown select <b>User</b>.</li>
<li>In the <b>User maximum factors</b> textbox write <b>8</b>.</li>
<li>Check the <b>Time-demean</b> box.</li>
<li>Check the <b>Time-standardize</b> box.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
This tells EViews to estimate a factor model of at most 8 factors, with the number of factors chosen from the full FRED-MD set of variables using the Bai and Ng (2002) procedure. The output is reproduced below:<br /><br />
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 6a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/fact_est1.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/fact_est1.png" title="Factor Estimation: Part 1"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 6b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/fact_est2.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/fact_est2.png" title="Factor Estimation: Part 2"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6a: Factor Estimation: Part 1</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 6b: Factor Estimation: Part 2</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
<h4 class="subseccol", id="sec3.3">Forecasting Industrial Production</h4>
Having estimated a factor model, we now repeat the exercise of forecasting industrial production. The exercise is considered in the original FRED-MD paper where the forecast dynamics are summarized as follows:
$$ y_{t+h} = \alpha_h + \beta_h(L)\hat{f}_t + \gamma_h(L)y_t $$
In other words, this is an $h-$step-ahead AR forecast with a constant and estimated factor as exogenous variables. In particular, to maintain comparability with the original exercise, we consider an 11-month-ahead forecast where $\hat{f}_t$ is obtained from the previously estimated factor model. In other words, we'll forecast for the period of available data in 2020. This exercise is repeated for the first estimated factor, the sum of the first two estimated factors, and no estimated factors, respectively.<br /><br />
As a first step in this exercise, we must extract the estimated factors. Although the factors are unobserved, they may be estimated from the estimated factor model as scores. In particular, proceed as follows:
<ol>
<li>From the open factor model, click on <b>Proc</b> and then <b>Make Scores...</b>.</li>
<li>Under the <b>Output specification</b> enter <b>1 2</b>.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
This will produce two series in the workfile: <b class="wfobj">F1</b> and <b class="wfobj">F2</b>.<br /><br />
Next, let's forecast industrial production by leveraging the EViews native autoregressive forecast engine. To do so, double click on the series <b class="wfobj">INDPRO</b> to open it. Next, click on <b>Proc/Automatic ARIMA Forecasting...</b> to open the dialog. We now proceed with the following steps:
<ol>
<li>In the <b>Estimation sample</b> textbox, enter <b>1960M03 2019M12</b>.</li>
<li>Under <b>Forecast length</b> enter <b>11</b>.</li>
<li>Under the <b>Regressors</b> textbox, enter <b>C F1</b>.</li>
<li>Click on the <b>Options</b> tab.</li>
<li>Under the <b>Output forecast name</b>, enter <b>INDPRO_F1</b>.</li>
<li>Ensure the <b>Forecast comparison graph</b> is checked.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURES 8a and 8b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 8a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/forecast_dialog1.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/forecast_dialog1.png" title="Forecast Dialog: Specification"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 8b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/forecast_dialog2.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/forecast_dialog2.png" title="Forecast Dialog: Options"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8a: Forecast Dialog: Specification</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 8b: Forecast Dialog: Options</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 8a and 8b :::::::::: -->
The options above specify that we wish to forecast the last 11 months of available data. Since our available sample runs from March 1960 to November 2020, we will estimate on the sample 1960 March through December 2019, and forecast out to November 2020.<br /><br />
<!-- :::::::::: FIGURES 9a and 9b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 9a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/forecast_11m1.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/forecast_11m1.png" title="Forecast: Actuals vs Forecast"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 9b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/forecast_11m2.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/forecast_11m2.png" title="Forecast: Forecast Comparison Graph"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9a: Forecast: Actuals vs Forecast</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 9b: Forecast: Forecast Comparison Graph</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 9a and 9b :::::::::: -->
For comparison, the same type of forecast is produced using <b>C (F1 + F2)</b> as exogenous variables, and <b>C</b> as the only exogenous variable. All three forecasts are superimposed on top of the original curve for comparison. This is reproduced below.
<!-- :::::::::: FIGURE 10 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/autofactsel/images/forecast_11m3.png"><img height="auto"
src="http://www.eviews.com/blog/autofactsel/images/forecast_11m3.png" title="Forecast Comaprison"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 10: Forecast Comparison</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 10 :::::::::: -->
<hr />
<h3 class="seccol", id="sec4">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/autofactsel/workfiles/fred-md.wf1"'><b class="wf">FRED-MD.WF1</b></a></li>
<li><a href="http://www.eviews.com/blog/autofactsel/workfiles/fred-md.prg"'><b class="wf">FRED-MD.PRG</b></a></li>
</ul><br /><br />
<hr />
<h3 class="seccol", id="sec5">References</h3>
<ol class="bib2xhtml">
<li id="bai-ng-2002">
Bai J and Ng S (2002), <i>"Determining the Number of Factors in Approximate Factor Models"</i>, Econometrica, Vol. 70, pp. 191-221. Wiley Online Library.
</li>
<li id="ahn-horenstein-2013">
Ahn SC and Horenstein AR (2013), <i>"Eigenvalue Ratio Test for the Number of Factors"</i>, Econometrica, Vol. 81, pp. 1203-1227. Wiley Online Library.
</li>
<li id="mcracken-ng-2013">
McCracken MW and Ng S (2016), <i>"FRED-MD: A Monthly Database for Macroeconomic Research"</i>, Econometrica, Vol. 34, pp. 574-589. Taylor & Francis.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com2tag:blogger.com,1999:blog-6883247404678549489.post-47321381959468296912020-12-21T09:09:00.000-08:002020-12-21T09:09:28.715-08:00Using Indicator Saturation to Detect Outliers and Structural Shifts<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
//border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
.wf {
}
.wfobj {
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
One of the potential pitfalls when working with time series datasets is that the data may have temporary or permanent changes to its levels. These changes could be single time-period outliers, or a fundamental structural shift.<br /><br />
EViews 12 introduces a new technique to detect and model these outliers and structural changes through <a href='http://eviews.com/help/helpintro.html#page/content%2FRegress2-Indicator_Saturation.html%23'>indicator saturation</a>.
in the recently released EViews 12, we thought we'd give another demonstration.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Indicator Saturation</a>
<li><a href="#sec2">AutoSearch/GETS</a>
<li><a href="#sec3">An Application with Consumption and Income</a>
</ol><br />
<h3 class="seccol", id="sec1">Indicator Saturation</h3>
Identifying changes in data is essential if we are to properly estimate models based upon these data. One way to detect changes would be to include dummy or indicator variables for potential observations where the change occurs in your regression, and then decide whether that included indicator is a valid regressor. Such variables could include:
<ul>
<li><b>Impulse Indicators</b> (IIS): a dummy variable equal to zero everywhere other than a single value of one at period $ t $. This indicator can be used to model single observation outliers, and is equivalent to the <b>@isperiod</b> EViews function used at the date corresponding to $ t $.</li>
<li><b>Step Indicators</b> (SIS): a step function variable equal to zero until $ t $ and one thereafter. This indicator can be used to model a shift in the intercept of an equation, and is equivalent to the <b>@after</b> EViews function used at the date corresponding to $ t $.</li>
<li><b>Trend Indicators</b> (TIS): a trend-break variable that is equal to zero until period $ t $ and then a follows a trend afterward. This indicator can be used to model a change in the trend of an equation (or the introduction of a trend term if one didn’t previously exist), and is equivalent to the <b>@trendbr</b> function used at the date corresponding to t.</li>
</ul><br />
The problem with the approach of including these variables in a traditional regression setting is that unless you know the specific dates where changes occur, you can quickly run into a situation where you have more variables than observations (since you’ll be adding at least one indicator variable for each observation in your estimation sample!).<br /><br />
Fortunately, recent advancements in variable selection techniques have meant that we can now perform variable selection on models with many more variables than observations, and so can saturate our regression with complex combinations of indicator variables and let the variable selection technique choose which are the most appropriate indicators to use.<br /><br /><br />
<h3 class="seccol", id="sec2">AutoSearch/GETS</h3>
One of the new technologies introduced in EViews 12 is the <a href='http://eviews.com/help/helpintro.html#page/content%2FVarsel-Background.html%23ww277256'><b>AutoSearch/GETS</b></a> algorithm for variable selection.<br /><br />
AutoSearch/GETS is a method of variable selection that follows the steps suggested by AutoSEARCH algorithm of <a href='http://www.sucarrat.net/research/autofim.pdf'>Escribano and Sucarrat (2011)</a>, which in turn builds upon the work in <a href='http://www.sucarrat.net/research/autofim.pdf'>Hoover and Perez (1999)</a>, and is similar to the technology behind the <b>Autometrics™</b> module in <a href='https://www.doornik.com/products.html#PcGive'><b>PcGive™</b></a>.<br /><br />
Mechanically the algorithm is similar to a <a href='http://eviews.com/help/helpintro.html#page/content%2FVarsel-Background.html%23ww277180'>backwards uni-directional stepwise</a> method:
<ol>
<li>The model with all search variables (termed the general unrestricted model, GUM) is estimated, and checked with a set of diagnostic tests.</li>
<li>A number of search paths are defined, one for each insignificant search variable in the GUM.</li>
<li>For each path, the insignificant variable defined in 2) is removed and then a series of further variable removal steps is taken, each time removing the most insignificant variable, and each time checking whether the current model passes the set of diagnostic tests. If the diagnostic tests fail after the removal of a variable, that variable is placed back into the model and prevented from being removed again along this path. Variable removal finishes once there are no more insignificant variables, or it is impossible to removal a variable without failing the diagnostic tests.</li>
<li>Once all paths have been calculated the final models produced by the paths are compared using an information criteria selection. The best model is then selected.</li>
</ol><br />
One of the advantages of AutoSearch/GETS is that the set of candidate variables can be split into sets, with search performed on each sets one at a time, then the selected variables from each set can be combined into a final set to be searched. This allows you to test more candidate variables than you have observations without creating singularities (as long as enough candidate variables are rejected), which means it is a perfect algorithm for indicator saturation studies.<br /><br /><br />
<h3 class="seccol", id="sec3">An Application with Consumption and Income</h3>
To demonstrate this feature, we will estimate a simple personal consumption equation, using log-difference of personal consumption as the dependent variable against a constant and log-differenced disposable income. This estimation is purely for demonstration of the saturation features in EViews 12, and should not be taken as worthy macroeconomic research!<br /><br />
Both data series were downloaded directly from the Federal Reserve of St Louis database, FRED, and contain monthly observations between 2002 and April 2020:<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/indicators/images/FRED.gif"><img height="auto"
src="http://www.eviews.com/blog/indicators/images/FRED.gif" title="FRED"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: FRED</small>
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
We begin by estimating a simple equation without any indicators included, using the following steps:
<ol>
<li><b>Quick/Estimate Equation</b> to bring up the equation estimation dialog.</li>
<li>Enter our dependent variable <b>DLOG(CONS)</b> followed by a constant and our regressor <b>DLOG(INCOME)</b>.</li>
<li>Clicking OK.</li>
</ol><br />
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 2a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/indicators/images/SimpleEqDiag.png"><img height="auto"
src="http://www.eviews.com/blog/indicators/images/SimpleEqDiag.png" title="Simple Estimation Dialog"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 2b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/indicators/images/SimpleEqRes.png"><img height="auto"
src="http://www.eviews.com/blog/indicators/images/SimpleEqRes.png" title="Simple Estimation Output"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2a: Simple Estimation Dialog</small>
<small>(Click to expand)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 2b: Simple Estimation Output</small>
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
Note that the coefficient on log differenced income is negative and statistically significant. Also note we have an R-squared of 35%.<br /><br />
If we click on the <b>Resids</b> button we can view a graph of the equation residuals.<br /><br />
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/indicators/images/SimpleEqResid.png"><img height="auto"
src="http://www.eviews.com/blog/indicators/images/SimpleEqResid.png" title="Estimation Residuals"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Estimation Residuals</small>
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
A quick eyeball test suggests that something happened towards the end of 2004, again in the middle of 2008 and then 2013. And obviously there was a huge shift at the start of the Covid-19 crisis in March/April 2020.<br /><br />
Now we’ll estimate a new equation where we will instruct EViews to detect for both impulse (outlier) and step-shift (change in intercept) indicators, with the following steps:
<ol>
<li><b>Quick/Estimate Equation</b>> to bring up the equation estimation dialog.</li>
<li>Enter our dependent variable <b>DLOG(CONS)</b> followed by a constant and our regressor <b>DLOG(INCOME)</b>.</li>
<li>Switch to the <b>Options Tab</b> and select <b>Auto-detection</b> under <b>Outliers/indicator saturation</b>.</li>
<li>Press the <b>Options</b> button and select both <b>Impulse</b> and <b>Step-shift</b> indicators.</li>
<li>Change the <b>Terminal condition p-value</b> to <b>0.01</b> (which will allow for more indicators entering the equation).</li>
<li>Clicking OK twice.</li>
</ol><br />
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 4a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/indicators/images/ImpulseEst.gif"><img height="auto"
src="http://www.eviews.com/blog/indicators/images/ImpulseEst.gif" title="Impulse Estimation"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 4b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/indicators/images/ImpulseRes.png"><img height="auto"
src="http://www.eviews.com/blog/indicators/images/ImpulseRes.png" title="Impulse Estimation Output"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4a: Impulse Estimation</small>
<small>(Click to expand)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 4b: Impulse Estimation Output</small>
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 4a and 4b :::::::::: -->
You can see that five indicators have been added to the equation, with three single observation indicators (2018M12, 2020M03, 2020M04), and two level shift indicators (2008M5, 2013M1).<br /><br />
The impact of these variables on the log-differenced income coefficient is dramatic, as is resulting R-squared.<br /><br />
Viewing the residual graph shows that the large outliers have been removed, and the location of detected indicators, as shown by the vertical lines, corresponds to the outliers we eyeballed in the original equation.<br /><br />
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/indicators/images/ImpulseResid.png"><img height="auto"
src="http://www.eviews.com/blog/indicators/images/ImpulseResid.png" title="Impulse Residuals"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Impulse Residuals</small>
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com1tag:blogger.com,1999:blog-6883247404678549489.post-58039045047729172952020-12-08T07:42:00.007-08:002020-12-08T08:04:04.203-08:00Nowcasting GDP with PMI using MIDAS-GETS<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
.wf {
}
.wfobj {
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<b>Nowcasting</b>, <a href='https://en.wikipedia.org/wiki/Nowcasting_(economics)'>the act of predicting the current or near-future state of a macro-economic variable</a>, has become one of the more popular research topics performed in EViews over the past decade.<br /><br />
Perhaps the most important technique in nowcasting is mixed data sampling, or MIDAS. We have discussed <a href='https://en.wikipedia.org/wiki/Mixed-data_sampling'>MIDAS</a> estimation in EViews in a couple of prior guest <a href='http://blog.eviews.com/2018/12/nowcasting-gdp-on-daily-basis.html'>blog posts</a>, but with the introduction of a <a href='http://eviews.com/EViews12/ev12ecest_n.html#midas'>new MIDAS technique</a> in the recently released EViews 12, we thought we'd give another demonstration.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">MIDAS – A Brief Background</a>
<ul>
<li><a href="#sec1.1">MIDAS-GETS</a>
</ul>
<li><a href="#sec2">MIDAS as a Nowcasting Tool</a>
<ul>
<li><a href="#sec1.1">PMI as a Nowcasting Instrument</a>
</ul>
<li><a href="#sec3">Nowcasting Exercises</a>
<ul>
<li><a href="#sec3.1">MIDAS-PDL</a>
<li><a href="#sec3.2">MIDAS-GETS</a>
<li><a href="#sec3.3">MIDAS-GETS with Indicator Saturation</a>
<li><a href="#sec3.4">Evaluating Nowcasting Models</a>
</ul>
</ol><br />
<h3 class="seccol", id="sec1">MIDAS – A Brief Background</h3>
<b>MIxed DAta Sampling</b> (MIDAS) is a regression technique that handles the case where the dependent variable is sampled or reported at a lower frequency than that of one, or more, of the independent regressors. This is common in macroeconomics where a number of important indicators, such as GDP, are usually reported on a quarterly basis, and other indicators, such as unemployment or stock prices, are reported on a monthly or even weekly basis.<br /><br />
The traditional approach to dealing with this mixed-frequency problem is to aggregate the higher-frequency variable into the same frequency as the lower. For example, when dealing with quarterly GDP and monthly unemployment, it's common practice to use the average monthly unemployment rate over the three months in a quarter as a single quarterly observation.
Whilst simple to implement, this approach loses fidelity in the higher-frequency variables. Any within-quarter movements in unemployment are lost, and the dataset is reduced by 2/3 (converting three observations into one).<br /><br />
MIDAS alleviates this issue by adding the individual components of the higher-frequency variable as independent regressors, allowing a separate coefficient for each component. For example, unemployment could have three separate regressors, one for the first month of the quarter, one for the second, and one for the third. This simple approach is called <b>U-MIDAS</b>.<br /><br />
A drawback of creating a regressor for each high-frequency component is that, in certain cases, one quickly saturates the equation with many regressors (curse of dimensionality). For instance, whereas monthly unemployment and quarterly GDP would generate 3 regressors for the one underlying variable, annual data would generate 12 regressors. If we had daily interest rates regressed with quarterly data, we would have over 90 regressors for the one underlying variable.<br /><br />
To mitigate this expansion of regressors, traditional MIDAS utilizes a selection of weighting schemes that parameterize the higher frequency variables into a smaller number of coefficients. The most common of these weighting schemes is <b>Almon/PDL</b> weighting.<br /><br />
A last note on MIDAS – although it is natural to want to include a number of high-frequency variables equal to the number of high-frequency periods per low frequency period (i.e. include three monthly variables since there are three months in a quarter), there is nothing that mathematically imposes this restriction in the MIDAS framework, and it is quite common to use many more variables than the natural number. <br /><br />
Going back to our unemployment/GDP example, you may want to utilize 9 months of unemployment data to explain GDP, and thus create 9 variables. In other words, you may determine that Q1 GDP is determined by unemployment in March, February, January (the three natural months), as well as 6 months previous (December, November, October, September, August, July). <br /><br />
Of course, you can also impose a lag structure to postulate that Q1 GDP is determined by February, January, …., June (a one month lag), or is determined by December, November, …, April (a three month lag). These 9 variables may then be reduced to a smaller number of coefficients using MIDAS weighting schemes, or, if the sample size permits, kept at 9 separate regressors.<br /><br />
<h4 class="subseccol", id="sec1.1">MIDAS-GETS</h4>
EViews 12 introduces a new MIDAS estimation method, <a href='http://eviews.com/help/helpintro.html#page/content%2Fmidas-Background.html%23ww331980'><b>MIDAS-GETS</b></a>. Rather than using a weighting scheme to reduce the number of variables, MIDAS-GETS controls the curse of dimensionality with the <a href='http://eviews.com/help/helpintro.html#page/content%2FVarsel-Background.html%23ww277256'><b>Auto-Search/GETS</b></a> variable selection algorithm to select which of the high frequency variables to include in the regression.<br /><br />
Since the Auto-Search/GETS algorithm is also used in EViews' indicator saturation detection routines, <a href='http://eviews.com/help/helpintro.html#page/content%2FRegress2-Indicator_Saturation.html%23'>indicator saturation</a> is available to MIDAS-GETS too. This means that the estimation can automatically include indicator variables that allow for outliers and structural changes in the model, which can dramatically enhance the forecasting performance of a model.<br /><br /><br />
<h3 class="seccol", id="sec2">MIDAS as a Nowcasting Tool</h3>
Although MIDAS was not necessarily introduced as a tool for nowcasting, its applicability to nowcasting is obvious; whilst traditional macroeconomic variables are typically sampled at low frequencies and with a reporting delay, high frequency data is available in a timely fashion that can often be used to estimate the current state of a low frequency variable.<br /><br />
More concretely, take Eurozone GDP. This important macro variable is released by <a href='https://ec.europa.eu/eurostat/news/release-calendar'>Eurostat</a> on a quarterly basis, usually 3 months after the quarter has ended. Thus, if you are at the end of July and want to know what the current GDP is, you must wait until December to receive the official statistics.<br /><br />
However, there may be monthly, or even daily variables, available without a delay. Unlike their latent counterparts, these can be used to estimate the current value of GDP immediately.<br /><br />
<h4 class="subseccol", id="sec2.1">PMI as a Nowcasting Instrument</h4>
One of the more popular variables used in nowcasting exercises are economic surveys. Surveys can be released at a high frequency with little delay and are often highly correlated with more traditional macroeconomic variables. Here at EViews we're fans of the <a href='https://www.markiteconomics.com/'><b>Purchasing Manager's Index</b></a> (PMI). The latter is derived from surveys of senior executives at private sector companies, is released monthly, and reflects the current state of the economy (i.e., has little delay between the survey and the release). In particular, we like the Eurozone composite measure which consistently shows a high correlation with growth in Eurozone GDP:<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/correlation.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/correlation.png" title="Eurozone PMI"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Eurozone PMI</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<h3 class="seccol", id="sec3">Nowcasting Exercises</h3>
As a simple demonstration of nowcasting with various MIDAS approaches, we're going to run a little exercise that uses monthly Eurozone composite PMI to nowcast quarterly Eurozone GDP growth.<br /><br />
Specifically, we have an EViews workfile with two pages: the first contains quarterly data from 1998q3 to 2020q3 with Eurozone GDP Growth (<b class="wfobj">GDP_GR</b>), whereas the second contains monthly data over the same period with Eurozone Composite PMI (<b class="wfobj">PMICMPEMU</b>).<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/workfile.gif"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/workfile.gif" title="Workfile"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Workfile</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<h4 class="subseccol", id="sec3.1">MIDAS-PDL</h4>
To begin, we'll pretend we are currently at the start of March 2019 and wish to nowcast the current (2019Q1) value of Eurozone GDP growth. We have our February PMI data handy (and all previous months). We'll estimate a standard MIDAS equation in EViews, using data until Q4 2018 to estimate our model, then use the February PMI with that equation to nowcast Q1 2019. We'll assume that GDP growth is explained by 12 months of PMI data and by the previous quarterly value of GDP growth. The steps we perform are:
<ol>
<li>Ensure we have the Quarterly page selected.</li>
<li>Quick->Estimate Equation</li>
<li>Select <b>MIDAS</b> as the <b>Method</b>.</li>
<li>Enter <b>GDP_GR C GDP_GR(-1)</b> as the dependent variable and quarterly regressors (a constant and the lagged value of GDP growth).</li>
<li>Enter <b>Monthly\PMICMPEMU(-1)</b> as the high frequency regressor. The (-1) here indicates that we wish to use data up until the second month of the quarter (the default is the third/last month of the quarter, so by lagging it one month, we use data until the second month).</li>
<li>Set the <b>Sample</b> to end in 2018q4.</li>
</ol><br />
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/MIDASPDL.gif"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/MIDASPDL.gif" title="MIDAS PDL Estimation Dialog"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: MIDAS PDL Estimation Dialog</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
The default MIDAS weighting method in EViews is PDL/Almon weighting with a polynomial degree of 3, which is what we'll use if we just click <b>OK</b>:<br /><br />
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/MIDASPDL.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/MIDASPDL.png" title="MIDAS PDL Estimation Output"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: MIDAS PDL Estimation Output</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Since this is a forecasting/nowcasting exercise, we won't delve into interpretation of these results, other than to note that all three MIDAS PDL terms are statistically significant.<br /><br />
Now, to perform the nowcast, we can simply use EViews' built in forecast engine and forecast for the “current” quarter (2019Q1). This is done with the following steps:
<ol>
<li>Click the <b>Forecast</b> button to bring up the forecast dialog.</li>
<li>Change the <b>Forecast sample</b> to <b>2019Q1 2019Q1</b> (just a single period).</li>
<li>Click <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/forecastdlg.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/forecastdlg.png" title="Forecast Dialog"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Forecast Dialog</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
The forecast will produce a new series in the workfile, <b class="wfobj">GDP_GRF</b> containing actual values for all observations other than 2019Q1, where it will contain the forecasted value. We can open this series together with the actual series in a group, and then graph it to see how close the single forecasted value is to the historical actual:<br /><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/forecast.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/forecast.png" title="MIDAS Forecast"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: MIDAS Forecast</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
The results seem a little underwhelming despite being just a single observation. Let's see if we can improve this forecast with the new MIDAS-GETS weighting method.<br /><br />
<h4 class="subseccol", id="sec3.2">MIDAS-GETS</h4>
To perform the new estimation, we undertake the same steps as before, but additionally change the weighting method:
<ol>
<li>Quick->Estimate Equation</li>
<li>Select <b>MIDAS</b> as the <b>Method</b>.</li>
<li>Enter <b>GDP_GR C GDP_GR(-1)</b> as the dependent variable and quarterly regressors.</li>
<li>Enter <b>Monthly\PMICMPEMU(-1)</b> as the high frequency regressor.</li>
<li>Enter <b>12</b> as the <b>Fixed Lags</b> parameter to indicate each quarter is explained by 12 months of data.</li>
<li>Set the <b>Sample</b> to end in 2018q4.</li>
<li>Switch the <b>Options Tab</b>.</li>
<li>Change <b>MIDAS weights</b> to <b>Auto/GETS</b>.</li>
</ol><br />
<!-- :::::::::: FIGURES 7a and 7b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 7a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/MIDASGETS.gif"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/MIDASGETS.gif" title="MIDAS-GETS Estimation Dialog"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 7b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/MIDASGETS.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/MIDASGETS.png" title="MIDAS-GETS Estimation Output"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7a: MIDAS-GETS Estimation Dialog</small><br />
<small>(Click to expand)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 7b: MIDAS-GETS Estimation Output</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 7a and 7b :::::::::: -->
Again, we won't delve into interpretation of these results, other than to mention that out of the 12 months of possible PMI data that could be used to explain each quarter, the equation chose to use only the two most recent months (denoted lags). We'll follow the exact same steps as previously to produce a forecast from this equation:<br /><br />
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/forecast2.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/forecast2.png" title="MIDAS-GETS Forecast"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: MIDAS-GETS Forecast</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
The nowcast looks better than the previous model's, although again it is only a single data point.<br /><br />
<h4 class="subseccol", id="sec3.3">MIDAS-GETS with Indicator Saturation</h4>
Finally, we'll estimate a MIDAS-GETS model that includes indicator saturation. This will automatically model outliers and structural changes in our equation. We follow the same steps as before but use the Auto/GETS options button to include searching for indicator variables. We will, in this case, search for outliers by only selecting impulse indicators.<br /><br />
<!-- :::::::::: FIGURES 9a and 9b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 9a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/MIDASGETSIS.gif"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/MIDASGETSIS.gif" title="MIDAS-GETS (Indicator Saturation) Estimation Dialog"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 7b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/MIDASGETSIS.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/MIDASGETSIS.png" title="MIDAS-GETS (Indicator Saturation) Estimation Output"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9a: MIDAS-GETS (Indicator Saturation) Estimation Dialog</small><br />
<small>(Click to expand)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 9b: MIDAS-GETS (Indicator Saturation) Estimation Output</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 9a and 9b :::::::::: -->
The results are worth a quick mention. The GETS routine selected eight periods with outliers. In particular, it included dummy variables for 8 quarters (2001Q1, 2005Q3, 2008Q2, 2008Q3, 2009Q1, 2010Q2, 2011Q2, 2013Q2), <b>and</b> chose to include more months of PMI data: namely, the first and second months of the current quarter, as well as 6, 9 and 12 months prior. In concrete terms, this means, for example, in 2018Q1, the equation chose to use February 2018, January 2018, September 2017, June 2017 and March 2017 as regressors.<br /><br />
Forecasting is performed in the same way, and produces a similar looking forecast to the previous MIDAS-GETS model:<br /><br />
<!-- :::::::::: FIGURE 10 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/forecast3.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/forecast3.png" title="MIDAS-GETS (Indicator Saturation) Forecast"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 10: MIDAS-GETS (Indicator Saturation) Forecast</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 10 :::::::::: -->
<h4 class="subseccol", id="sec3.4">Evaluating Nowcasting Models</h4>
The previous examples all performed a single point nowcast of GDP growth and a quick eyeball-test showed that MIDAS-GETS performed well. Here we'll demonstrate a formal nowcast evaluation exercise. In particular, we'll estimate a handful of different models on a rolling basis. The first estimation will again assume we are in February 2018, estimating on data from 1999Q3 through 2017Q4, and will then nowcast 2018Q1. We'll then move a quarter and assume we're in May 2018, estimate through 2018Q1 and nowcast 2018Q2. Next, we'll move another quarter and so on until 2019Q4, meaning we have eight rolling nowcasts.<br /><br />
We'll estimate and nowcast from six different equation specifications:
<ol>
<li>A simple AR(1) model with no PMI (GDP growth regressed against a lag and a constant).</li>
<li>Simple AR(1) model with aggregated PMI (average of the available monthly PMI data).</li>
<li>PDL/Almon MIDAS with 12 monthly lags of PMI and lagged GDP growth.</li>
<li>U-MIDAS with 12 monthly lags of PMI and lagged GDP growth.</li>
<li>MIDAS-GETS with 12 monthly lags of PMI and lagged GDP growth and no indicators.</li>
<li>MIDAS-GETS with 12 monthly lags of PMI and lagged GDP growth with impulse indicators.</li>
</ol><br />
Models 3, 5 and 6 are identical to those we estimated in the early examples. We've written a quick EViews program that will perform these nowcasts:
<pre><code>
<span style="color: green;">'create gdp growth series</span>
series gdp_gr = @pca(eur_gdp)
<span style="color: green;">'keep a list of equation names for easier referencing later</span>
%eqlist = "eq_umid eq_agg eq_pdl eq_simple eq_getsis eq_gets"
<span style="color: green;">'create empty forecast series for each equation</span>
group forcs gdp_gr
<span style="color: blue;">for</span> %j {%eqlist}
series gdp_{%j}
forcs.add gdp_{%j}
<span style="color: blue;">next</span>
<span style="color: green;">'estimate/nowcast loop</span>
<span style="color: blue;">for</span> !i=0 <span style="color: blue;">to</span> 7
<span style="color: green;">'estimate</span>
smpl @first 2017q4+!i
equation eq_simple.ls gdp_gr c gdp_gr(-1)
equation eq_agg.ls gdp_gr c gdp_gr(-1) agg_pmi
equation eq_pdl.midas(fixedlag=12) gdp_gr c gdp_gr(-1) @ monthly\pmicmpemu(-1)
equation eq_umid.midas(midwgt=umidas, fixedlag=12) gdp_gr c gdp_gr(-1) @ monthly\pmicmpemu(-1)
equation eq_gets.midas(fixedlag=12, midwgt=autogets) gdp_gr c gdp_gr(-1) @ monthly\pmicmpemu(-1)
equation eq_getsis.midas(fixedlag=12, midwgt=autogets, iis) gdp_gr c gdp_gr(-1) @ monthly\pmicmpemu(-1)
<span style="color: green;">'nowcast</span>
smpl 2018q1+!i 2018q1+!i
<span style="color: blue;">for</span> %j {%eqlist}
{%j}.forecast temp
gdp_{%j} = temp
d temp
<span style="color: blue;">next</span>
<span style="color: blue;">next</span>
</code></pre>
Once we have the six nowcast series of eight periods each, we can use EViews' built in forecast evaluation engine to compare the nowcasts, by opening up the series containing the true value (GDP_GR) and clicking on View->Forecast Evaluation, and then giving the names of the nowcast series. The results of the is evaluation are:<br /><br />
<!-- :::::::::: FIGURE 11 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/midas_gets/images/evaluation.png"><img height="auto"
src="http://www.eviews.com/blog/midas_gets/images/evaluation.png" title="MIDAS Evaluation"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 11: MIDAS Evaluation</small><br />
<small>(Click to expand)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 11 :::::::::: -->
From the evaluation statistics, we see that the MIDAS-GETS nowcast, <b class="wfobj">GDP_EQ_GETSIS</b> performs very well, with the indicator saturation version giving the lowest RMSE, MAE and SMAPE. The non-indicator version, <b class="wfobj">GDP_EQ_GETS</b>, also performs better than the other traditional MIDAS methods.<br /><br /><br />
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-46714369548145411542020-12-02T08:06:00.001-08:002020-12-09T13:18:38.709-08:00Wavelet Analysis: Part II (Applications in EViews)<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
.wf {
}
.wfobj {
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
This is the second of two entries devoted to wavelets. <a href='http://blog.eviews.com/2020/11/wavelet-analysis-part-i-theoretical.html'>Part I</a> was devoted to theoretical underpinnings. Here, we demonstrate the use and application of these principles to empirical exercises using the wavelet engine released with EViews 12.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Wavelet Transforms</a>
<ul>
<li><a href="#sec2.1">Example 1: Wavelet Transforms as Informal Tests for (Non-)Stationarity</a>
<li><a href="#sec2.2">Example 2: MRA as Seasonal Adjustment</a>
<li><a href="#sec2.3">Example 3: DWT vs. MODWT</a>
</ul>
<li><a href="#sec3">Variance Decomposition</a>
<ul>
<li><a href="#sec3.1">Example: MODWT Unbiased Variance Decomposition</a>
</ul>
<li><a href="#sec4">Wavelet Thresholding</a>
<ul>
<li><a href="#sec4.1">Example: Thresholding as Signal Extraction</a>
</ul>
<li><a href="#sec5">Outlier Detection</a>
<ul>
<li><a href="#sec5.1">Example: Bilen and Huzurbazar (2002) Outlier Detection</a>
</ul>
<li><a href="#sec6">Conclusion</a>
<li><a href="#sec7">Files</a>
<li><a href="#sec8">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction to Wavelets</h3>
The new EViews 12 release has introduced several new statistical and econometric procedures. Among them is an engine for wavelet analysis. This is a complement to the existing battery of techniques in EViews used to analyze and isolate features which characterize a time series. While there are undoubtedly numerous applications to wavelets such as regression, unit root testing, fractional integration order estimation, and bootstrapping (wavestrapping), here we highlight the new EViews wavelet engine. In particular, we focuses on four popular and most often used areas of wavelet analysis:
<ul>
<li>Transforms</li>
<li>Variance decomposition</li>
<li>Thresholding</li>
<li>Outlier detection</li>
</ul><br /><br />
<h3 class="seccol", id="sec2">Wavelet Transforms</h3>
The first step in wavelet analysis is usually a wavelet transform of a time series of interest. This is similar in spirit to a Fourier transform. The time series is decomposed into its constituent spectral (frequency) features on a scale-by-scale basis. Recall that the idea of scale in wavelet analysis is akin to frequency in Fourier analysis. This is nothing more than a re-expression of time series observations in time, to their behaviour in the frequency domain. This allows us to see which scales (frequencies) dominate in terms of activity.<br /><br />
<h4 class="subseccol", id="sec2.1">Example 1: Wavelet Transforms as Informal Tests for (Non-)Stationarity</h4>
Many important and routine tasks in time series analysis require classifying data as stationary or non-stationary. Any of the unit root tests available in EViews are designed to formally address such classifications. Nevertheless, wavelet transforms such as the discrete wavelet transform (DWT) or the maximum overlap discrete wavelet transform (MODWT) can also be used for a similar purpose. While formal wavelet-based unit root tests are available in the literature, here we focus on demonstrating how wavelets can be used as an exploratory tool for stationarity determination <i>in lieu</i> of a formal test.<br /><br />
Recall from the theoretical discussion of Mallat's algorithm in <a href='http://blog.eviews.com/2020/11/wavelet-analysis-part-i-theoretical.html'>Part I</a> that discrete wavelet transforms partition the frequency range into finer and finer blocks. For instance, at the first scale, the frequency range is split into two equal parts. The first, lower frequency part, is captured by the scaling coefficients and corresponds to the traditional (Fourier) frequency range $ \sbrace{0,\, \pi} $. The second, higher frequency part, is captured by the wavelet coefficients and corresponds to the traditional frequency range $ \sbrace{\pi,\, 2\pi} $. At the second stage, the lower frequency from the previous scale, namely the frequency region roughly corresponding to $ \sbrace{0,\, \pi} $ in the traditional Fourier context, is again split into two equal portions. Accordingly, the wavelet coefficients at scale 2 would roughly correspond to the traditional frequency region $ \sbrace{\frac{\pi}{2},\, \pi} $, whereas the scaling coefficients would roughly correspond to the traditional frequency region $ \sbrace{0,\, \frac{\pi}{2}} $, and so on.<br /><br />
This decomposition affords the ability to identify which features of the original time series data are dominant at which scale. In particular, if the spectra (read wavelet/scaling coefficient magnitudes) at a given scale are high, this would indicate that those coefficients are registering behaviours in the underlying data which dominate at said scale and frequency region. For instance, in the traditional Fourier context, if a series has very pronounced spectra near the frequency zero, this indicates that observations of that time series are very persistent (die off slowly). Naturally, one would classify such a series as non-stationary, possibly exhibiting a unit root. Alternatively, if a series has very pronounced spectra at higher frequencies, this indicates that the time series is driven by dynamics that frequently appear and disappear. In other words, the time series is driven by transient features and one would classify the time series as stationary. The analogue of this analysis in the context of wavelet analysis would proceed as follows.<br /><br />
At the first scale, if wavelet spectra dominate scaling spectra, the underlying series is dominated by higher frequency (transitory) forces and the series is most likely stationary. At scale two, if the scaling spectra dominate the wavelet spectra from the first and second scales, this indicates that lower frequency forces dominate higher frequency dynamics, providing evidence of non-stationarity. Naturally, this scale-based analysis carries on until the final decomposition scale.<br /><br />
To demonstrate the dynamics outlined above, we'll consider Canadian real exchange rate data extracted from the dataset in Pesaran (2007). This is a quarterly time series running from 1973Q1 to 1998Q4. The data can be found in <a href="http://www.eviews.com/blog/wavelets/workfiles/wavelets.wf1"'><b class="wf">WAVELETS.WF1</b></a>. The series we're interested in is <b class="wfobj">CANADA_RER</b>. We'll demonstrate with a discrete wavelet transform (DWT) and the Haar wavelet filter. To facilitate the discussion to follow, we will consider the transformation only up to the first scale.<br /><br />
To perform the transform, proceed in the following steps:
<ol>
<li>Double click on <b class="wfobj">CANADA_RER</b> to open the series window.</li>
<li>Click on <b>View/Wavelet Analysis/Transforms...</b></li>
<li>From the <b>Max scale</b> dropdown, select <b>1</b>.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 2a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex1_1.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex1_1.png" title="Canadian RER: Discrete Wavelet Transform Part 1"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 2b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex1_2.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex1_2.png" title="Canadian RER: Discrete Wavelet Transform Part 2"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2a: Canadian RER: Discrete Wavelet Transform Part 1</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 2b: Canadian RER: Discrete Wavelet Transform Part 2</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
The output is a spool object with the spool tree listing the summary, original series, as well as wavelet and scaling coefficients for each scale (in this case just 1). The first of these is a summary of the wavelet transformation performed. Note here that since the number of available observations is 104 a dyadic adjustment using the series mean was applied to achieve dyadic length.<br /><br />
The first plot in the output is a plot of the original series, in addition to the padded values in case a dyadic adjustment was applied. The last two plots are respectively the wavelet and scaling coefficients. Recall that at the first scale, the wavelet decomposition effectively splits the frequency spectrum into two equal portions: the low and high frequency portions, respectively. Recall further that the low frequency portion is associated with the scaling coefficients $ \mathbf{V} $ whereas the high frequency portion is associated with the wavelet coefficients $ \mathbf{W} $.<br /><br />
Evidently, the spectra characterizing the wavelet coefficients are significantly less pronounced than those characterizing the scaling coefficients. This is an indication that the Canadian real exchange series is possibly non-stationary. Furthermore, observe that the wavelet plot has two dashed red lines. These represent the $ \pm 1 $ standard deviation of the coefficients at that scale. This is particularly useful in visualizing which wavelet coefficients should be shrunk to zero (are insignificant) in wavelet shrinkage applications. (We will return to this later when we discuss wavelet thresholding outright.) Recall that coefficients exceeding some threshold bound (in this case the standard deviation) ought to be retained, while the remaining coefficients are shrunk to zero. From this we see that the majority of wavelet coefficients at scale 1 can be discarded. This is further evidence that high frequency forces in the <b class="wfobj">CANADA_RER</b> series are not very pronounced.<br /><br />
To justify the intuition, we can perform a quick ADF unit root test on <b class="wfobj">CANADA_RER</b>. To do so, from the open <b class="wfobj">CANADA_RER</b> series window, proceed as follows:
<ol>
<li>Click on <b>View/Unit Root Tests/Standard Unit Root Test...</b></li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/canada_rer_ur.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/canada_rer_ur.png" title="Canadian RER: Unit Root Test"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Canadian RER Unit Root Test</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
Our intuition is indeed correct. From the unit root test output it is clear that the p-value associated with the ADF unit root test is 0.7643 -- too high to reject the null hypothesis of a unit root at any meaningful significance level.<br /><br />
While the wavelet decomposition is not a formal test, it is certainly a great way of identifying which scales (read frequencies) dominate the underlying series behaviour. Naturally, this analysis is not limited to the first scale. To see this, we will repeat the exercise above using the maximum overlap discrete wavelet transform (MODWT) with the Daubechies (daublet) filter of length 6. We will also perform the transform upto the maximum scale possible, and also indicate which and how many wavelet coefficient are affected by the boundary. (See <a href='http://blog.eviews.com/2020/11/wavelet-analysis-part-i-theoretical.html'>Part I</a> for a discussion of boundary conditions.)<br /><br />
From the open <b class="wfobj">CANADA_RER</b> series window, we proceed in the following steps:
<ol>
<li>Click on <b>View/Wavelet Analysis/Transforms...</b></li>
<li>Change the <b>Decomposition</b> dropdown to <b>Overlap transform - MODWT</b>.</li>
<li>Change the <b>Class</b> dropdown to <b>Daubechies</b>.</li>
<li>From the <b>Length</b> dropdown select <b>6</b>.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURES 4a, 4b, and 4c :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 4a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex2_1.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex2_1.png" title="Canadian RER: MODWT Part 1"
width="240" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 4b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex2_2.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex2_2.png" title="Canadian RER: MODWT Part 2"
width="240" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 4c :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex2_3.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex2_3.png" title="Canadian RER: MODWT Part 3"
width="240" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4a: Canadian RER: MODWT Part 1</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 4b: Canadian RER: MODWT Part 2</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 4c: Canadian RER: MODWT Part 3</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 4a, 4b, and 4c :::::::::: -->
As before, the output is a spool object with wavelet and scaling coefficients as individual spool elements. Since the MODWT is not an orthonormal transform and since it uses all of the available observations, wavelet and scaling coefficients are of input series length and do not require length adjustments. Notice the significantly more pronounced ''wave'' behvaviour across wavelet coefficients and scales. This is a consequence of the fact that the MODWT is not an orthonormal transform and is significantly more redundant than the DWT counterpart. In other words, patterns retain their momentum as they evolve.<br /><br />
Analogous to the DWT, the MODWT partitions the frequency range into finer and finer blocks. At the first scale, we see that only a few wavelet coefficients exhibit significant spikes (ie. exceed the threshold bounds). At scales two and three, it is evident that transient features persist, but after that, don't seem to contribute much. Alternatively, the scaling coefficients at the final scale (scale 6) are roughly twice as large (0.20) as the largest wavelet spectrum (0.10) which manifests at scales 1 and 2. These are all indications that lower frequency forces dominate those at higher frequencies and that the underlying series is most likely non-stationary.<br /><br />
Finally, notice that for each scale, those coefficients affected by the boundary are displayed in red, and their count reported in the legends. A vertical dashed black line shows the region upto which the boundary conditions persist. Boundary coefficients are an important consequence of longer filters and higher scales. Evidently, as the scale is increased, boundary coefficients consume the entire set of coefficients. Moreover, since the MODWT is a redundant transform, the number of boundary coefficients will always be greater than those in the orthonormal DWT. As before the $ \pm 1 $ standard deviation bounds are available for reference.<br /><br />
<h4 class="subseccol", id="sec2.2">Example 2: MRA as Seasonal Adjustment</h4>
It's worth noting that multiresolution analysis (MRA) is often used as an intermediate step toward some final inferential procedure. For instance, if the objective is to run a unit root test on some series, we may we wish to do so on the true signal, having discarded the noise, in order to get a more reliable test. Similarly, we may wish to run regressions on series which have been <i>smoothed</i>. Discarding noise from regressors may prevent clouding of inferential conclusions. This is the idea behind most existing smoothing techniques in the literature.<br /><br />
In fact, wavelets are very well adapted to isolating many different kinds of trends and patterns, whether seasonal, non-stationary, non-linear, etc. Here we demonstrate their potential using an artificial dataset with a quarterly seasonality. In particular, we generate 128 random normal variates and excite every first quarter with a shock. These modified normal variates are then fed as innovations into a stationary autoregressive (AR) process. This is achieved with a few commands in the command window or an EViews program as follows:
<pre><code>
rndseed 128 <span style="color: green;">'set the random seed</span>
wfcreate q 1989 2020 <span style="color: green;">'make quarterly workfile with 128 quarter</span>
series eps = 8*(@quarter=1) + @rnorm <span style="color: green;">'create random normal innovations with each first quarter having mean 8</span>
series x <span style="color: green;">'create a series x</span>
x(1) = @rnorm <span style="color: green;">'set the first observation to a random normal value</span>
smpl 1989q2 @last <span style="color: green;">'start the sample at the 2nd quarter</span>
x = 0.75*x(-1) + eps <span style="color: green;">'generate an AR process using eps as innovations</span>
smpl @all <span style="color: green;">'reset the sample to the full workfile range</span>
</code></pre>
To truly appreciate the idea behind MRA, one ought to set the maximum decomposition level to a lower value. This is because the smooth series extracts the ''signal'' from the original series for all scales beyond the maximum decomposition level, whereas the ''noise'' portion of the original series is decomposed on a scale-by-scale basis for all scales upto the maximum decomposition level. We now perform a MODWT MRA on the <b class="wfobj">X</b> series using a Daubechies filter of length 4 and maximum decomposition level 2, as follows:
<ol>
<li>Double click on <b class="wfobj">X</b> to open the series.</li>
<li>Click on <b>View/Wavelet Analysis/Transforms...</b></li>
<li>Change the <b>Decomposition</b> dropdown to <b>Overlap multires. - MODWT MRA</b>.</li>
<li>Set the <b>Max scale</b> textbox to <b>2</b>.</li>
<li>Change the <b>Class</b> dropdown to <b>Daubechies</b>.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 6a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex4_1.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex4_1.png" title="Quarterly Seasonality: MODWT MRA Part 1"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 6b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex4_2.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex4_2.png" title="Quarterly Seasonality: MODWT MRA Part 2"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6a: Quarterly Seasonality: MODWT MRA Part 1</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 6b: Quarterly Seasonality: MODWT MRA Part 2</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 6a and 6b :::::::::: -->
The output is again a spool object with smooth and detail series as individual spool elements. The first plot is that of the smooth series at the maximum decomposition level overlaying the original series for context. Any observations affected by boundary coefficients will be reported in red and their number reported in the legend. Furthermore, since observations affected by the boundary will be split between the beginning and end of original series observations, two dashed vertical lines are provided at each decomposition scale. These isolate the areas which partition the total set of observations into those affected by the boundary, and those which are not.<br /><br />
It is clear from the smooth series that seasonal patterns have been dropped from the underlying trend approximation of the original data. This is precisely what we want and the idea behind other well known seasonal adjustments techniques such as TRAMO/SEATS, X-12, X-13, STL Decompositions, etc., all of which can also be performed in EViews for comparison. In fact, the figure below plots our MRA smooth series against the STL decomposition trend series performed on the same data.<br /><br />
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex4_3.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex4_3.png" title="MODWT MRA Smooth vs STL Trend"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7: MODWT MRA Smooth vs. STL Trend</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
The two series are undoubtedly very similar, as they should be!<br /><br />
This figure above also suggests that the STL seasonal series should be very similar to the details from our MODWT MRA decomposition. Before demonstrating this, we remind readers that whereas the STL decomposition produces a single series estimate of the seasonal pattern, wavelet MRA procedures decompose noise (in this case seasonal patterns) on a scale by scale basis. Accordingly, at scale 1, the the MRA detail series captures all movements on a scale of 0 to 2 quarters. At scale 2, the MRA detail series captures movements on a scale of 2 to 4 quarters, and so on. In general, for each scale $ j $, the detail series capture patterns on a scale $ 2^{j-1} $ to $ 2^{j} $ units, whereas the smooth series captures patterns on a scale of $ 2^{j} $ units.<br /><br />
Finally, turning to the comparison of seasonal variation estimates between the MRA and STL, we need to sum all detail series to compound their effect and produce a single series estimate of noise. We can then compare this with single series estimate of seasonality from the STL decomposition.<br /><br />
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex4_4.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex4_4.png" title="MODWT MRA Details vs STL Seasonality"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: MODWT MRA Details vs. STL Seasonality</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
As expected, the series are nearly identical.<br /><br />
To demonstrate this in the context of non-artificial data, we'll run a MODWT MRA on the Canadian real exchange rate data using a Least Asymmetric filter of length 12 and a maximum decomposition scale 3.<br /><br />
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 5a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex3_1.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex3_1.png" title="Canadian RER: MODWT Multiresolution Analysis Part 1"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 5b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex3_2.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex3_2.png" title="Canadian RER: MODWT Multiresolution Analysis Part 2"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5a: Canadian RER: MODWT Multiresolution Analysis Part 1</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 5b: Canadian RER: MODWT Multiresolution Analysis Part 2</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
Recall that the main use for MRA is the separation of the true ''signal'' of the underlying series from its noise, at a given decomposition level. Here, the ''Smooths 3'' series is the signal approximation and from the plot seems to follow the contours of the original data. The remaining three series - ''Details 3'', ''Details 2'', and ''Details 1'' - approximate the noise at their scales. Clearly at the first scale, noise is rather negligible. This is an indication that the majority of the signal is in the lower frequency range. As we move to the second scale, the noise becomes more prominent, but still relatively negligible. Again, this confirms that the true signal is in a frequency range lower still, and so on. More importantly, this is indicative that the dynamics driving the noise are not particularly transitory. Accordingly, this would rule out traditional seasonality as a force driving the noise, but would not necessarily preclude the existence of non-stationary seasonality such as seasonal unit roots.<br /><br />
<h4 class="subseccol", id="sec2.3">Example 3: DWT vs. MODWT</h4>
We have already mentioned that the primary difference between the DWT and MODWT is redundancy. The DWT is an orthonormal decomposition whereas the MODWT is not. This is certainly an advantage of the DWT over its MODWT counterpart since it guarantees that at each scale, the decomposition captures only those features which characterize that scale, and that scale alone. Nevertheless, the DWT requires input series to be of dyadic length, whereas the MODWT does not. This is an advantage of the MODWT since information is never dropped or added to derive the transform. Nevertheless, the MODWT has an additional advantage over the DWT and it has to do with spectral-time alignment - any pronounced observations in the time domain register as spikes in the wavelet domain at the same time spot. This is unlike the DWT where this alignment fails to hold. Formally, it is said that the MODWT is associated with a <b>zero-phase</b> filter, whereas the DWT does not. In practice, this means that outlying characteristics (spikes) in the DWT MRA will not align with outlying features of the original time series, whereas they will in the case of the MODWT MRA.<br /><br />
To demonstrate this difference we will generate a time series of length 128 and fill it with random normal observations. We will then introduce a large outlying observation at observation 64. We will then perform a DWT MRA and a MODWT MRA decomposition of the same data using a Daubechies filter of length 4 and study the differences. We will also only consider the first scale since the remaining scales do little to further the intuition.<br /><br />
We can begin by creating our artificial data by typing in the following set of commands in the command window:
<pre><code>
wfcreate u 128
series x = @rnorm
x(64) = 40
</code></pre>
These commands create a workfile of length 128, and a series <b class="wfobj">X</b> filled with random normal variates. The 64th observation is then set to 40 - roughly 10 times as large as observations in the top 1\% of the Gaussian distribution.<br /><br /> We then generate a DWT MRA and a MODWT MRA transform of the same series. The output is summarized in the plots below.<br /><br />
<!-- :::::::::: FIGURES 9a and 9b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 9a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex5_1.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex5_1.png" title="Outlying Observation: DWT MRA"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 9b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/transform_ex5_2.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/transform_ex5_2.png" title="Outlying Observation: MODWT MRA"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9a: Outlying Observation: DWT MRA</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 9b: Outlying Observation: MODWT MRA</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 9a and 9b :::::::::: -->
Evidently the peak of the ''shark fin'' pattern in the DWT MRA smooth series does not align with the outlying observation that generated it in the original data. In other words, whereas the outlying observation is at time $ t = 64 $, the peak of the smooth series occurs at time $ t = 63 $. This in contrast to the MODWT MRA smooth series which clearly aligns its peak with the outlying observation in the original data.<br /><br /><br />
<h3 class="seccol", id="sec3">Variance Decomposition</h3>
Another traditional application of wavelets is to variance decomposition. Just as wavelet transforms can decompose a series signal across scales, they can also decompose a series variance across scales. In particular, this is a decomposition of the amount of original variation attributed to a given scale. Naturally, the conclusions derived above on transience would hold here as well. For instance, if the contribution to overall variation is largest at scale 1, this would indicate that it is transitory forces which contribute most to overall variation. The opposite is true if higher scales are associated with larger contributions to overall variation.<br /><br />
<h4 class="subseccol", id="sec3.1">Example: MODWT Unbiased Variance Decomposition</h4>
To demonstrate the procedure, we will use Japanese real exchange rate data from 1973Q1 to 1988Q4, again extracted from the Pesaran (2007) dataset. The series of interest is called <b class="wfobj">JAPAN_RER</b>. We will produce a scale-by-scale decomposition of variance contributions using the MODWT with a Daubechies filter of length 4. Furthermore, we'll produce a 95% confidence intervals using the asymptotic Chi-squared distribution with a band-pass estimate for the EDOF. The band-pass EDOF is preferred here since the sample size is less than 128 and the asymptotic approximation to the EDOF requires a sample size of at least 128 observations for decent results.<br /><br />
From the open series window, proceed in the following steps:
<ol>
<li>Click on <b>View/Wavelet Analysis/Variance Decomposition...</b></li>
<li>Change the <b>CI type</b> dropdown to <b>Asymp. Band-Limited</b>.</li>
<li>From the <b>Decomposition</b> dropdown select <b>Overlap transform - MODWT</b>.</li>
<li>Set the <b>Class</b> dropdown to <b>Daubechies</b>.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURES 11a and 11b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 11a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/vardecomp_ex1_1.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/vardecomp_ex1_1.png" title="Japanese RER: MODWT Variance Decomp. Part 1"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 11b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/vardecomp_ex1_2.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/vardecomp_ex1_2.png" title="Japanese RER: MODWT Variance Decomp. Part 2"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 11a: Japanese RER: MODWT Variance Decomp. Part 1</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 11b: Japanese RER: MODWT Variance Decomp. Part 2</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 11a and 11b :::::::::: -->
The output is a spool object with the spool tree listing the summary, spectrum table, variance distribution across-scales, confidence intervals (CIs) across scales, and the cumulative variance and CIs. The spectrum table lists the contribution to overall variance by wavelet coefficients at each scale. In particular, the column titled <b>Variance</b> shows the variance contributed to the total at a given scale. Columns titled <b>Rel. Proport.</b> and <b>Cum. Proport.</b> display, respectivel, the proportion of overall variance contributing to the total at a given scale and its cumulative total. Lastly, in case CIs are produced, the last two columns display, respectively, the lower and upper confidence interval values at a given scale.<br /><br />
The first plot is a histogram of variances at each given scale. It is clear that the majority of variation in the <b class="wfobj">JAPAN_RER</b> series comes from higher scales, or lower frequencies. This is indicative of persistent behaviour in the original data, and possibly evidence of a unit root. A quick unit root test on the series will confirm this intuition. The plot below summarizes the output of a unit root test on <b class="wfobj">JAPAN_RER</b>.<br /><br />
<!-- :::::::::: FIGURE 12 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/japan_rer_ur.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/japan_rer_ur.png" title="Japanese RER: Unit Root Test"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 12: Japanese RER Unit Root Test</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 12 :::::::::: -->
Returning to the wavelet variance decomposition output, following the distribution plot is a plot of the variance values along with their 95% confidence intervals at each scale. At last, the final plot displays variances and CIs accumulated across scales.<br /><br /><br />
<h3 class="seccol", id="sec4">Wavelet Thresholding</h3>
A particularly important aspect of empirical work is discerning useful data from noise. In other words, if an observed time series is obscured by the presence of unwanted noise, it is critical to obtain an estimate of this noise and filter it from the observed data in order to retain the useful information, or the signal. Traditionally, this filtration and signal extraction was achieved using Fourier transforms or a number of previously mentioned routines such as the STL decomposition. While the former is typically better suited to stationary data, the latter can accommodate non-stationarities, non-linearities, and seasonalities of arbitrary type. This makes STL an attractive tool in this space and similar (but ultimately different) in function to wavelet thresholding. The following examples explores these nuances.<br /><br />
<h4 class="subseccol", id="sec4.1">Example: Thresholding as Signal Extraction</h4>
Given a series of observed data, recall that STL decomposition produces three curves:
<ul>
<li>Trend</li>
<li>Seasonality</li>
<li>Remainder</li>
</ul><br />
The last of these is obtained by subtracting from the original data the first two curves. As an additional byproduct, STL also produces a seasonally adjusted version of the original data which derives by subtracting from the original data the seasonality curve.<br /><br />
In contrast, recall from the theoretical discussion in <a href='http://blog.eviews.com/2020/11/wavelet-analysis-part-i-theoretical.html'>Part I</a> of this series that the principle governing wavelet-based signal extraction, otherwise known as <b>wavelet thresholding</b> or <b>wavelet shrinkage</b>, is to <i>shrink</i> any wavelet coefficients not exceeding some <b>threshold</b> to zero and then exploit the MRA to synthesize the signal of interest using the modified wavelet coefficients. This produces two curves:
<ul>
<li>Signal</li>
<li>Residual</li>
</ul>
where the latter is just the original data minus the signal estimate.<br /><br />
Because wavelet thresholding treats any insignificant transient features as noise, it is very likely that any reticent cylclicality would be treated as noise and driven to zero. In this regard, the extracted signal, while perhaps free of cyclical dynamics, would really be so only by technicality, and not by intention. This is in contrast to STL which derives an explicit estimate of seasonal features, and then removes those from the original data to derive the seasonally adjusted curve.
Nevertheless, in many instances, the STL seasonally adjusted curve may behave quite similarly to the signal extracted via wavelet thresholding. To demonstrate this, we'll use French real exchange rate data from 1973Q1 to 1988Q4 extracted from the Pesaran (2007) dataset. The series of interest is called <b class="wfobj">FRANCE_RER</b>. We'll also start with performing a MODWT threshold using a Least Asymmetric filter of length 12, and maximum decomposition level 1.<br /><br />
Double click on the <b class="wfobj">FRANCE_RER</b> series to open its window and proceed as follows:
<ol>
<li>Click on <b>View/Wavelet Analysis/Thresholding (Denoising)...</b></li>
<li>Change the <b>Decomposition</b> dropdown to <b>Overlap transform - MODWT</b>.</li>
<li>Set the <b>Max scale</b> to <b>1</b>.</li>
<li>Change the <b>Class</b> dropdown to <b>Least Asymmetric</b>.</li>
<li>Set the <b>Length</b> dropdown to <b>12</b>.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURE 14 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/threshold_ex1_1.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/threshold_ex1_1.png" title="French RER: MODWT Thresholding"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 14: French RER: MODWT Thresholding</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 14 :::::::::: -->
The output is a spool object with the spool tree listing the summary, denoised function, and noise. The table is a summary of the thresholding procedure performed. The first plot is the de-noised function (signal) superimposed over the original series for context. The second plot is the noise process extracted from the original series.<br /><br />
Next, let's derive the STL decomposition of the same data. The plots below superimpose the wavelet signal estimate on top of the STL seasonally adjusted curve, as well as the wavelet thresholded noise on top of the STL remainder series.<br /><br />
<!-- :::::::::: FIGURES 15a and 15b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 11a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/threshold_ex1_2.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/threshold_ex1_2.png" title="French RER: STL Seas. Adj. vs. Wavelet Tresh. Signal"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 11b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/wavelets/images/threshold_ex1_3.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/threshold_ex1_3.png" title="French RER: STL Remainder vs. Wavelet Tresh. Noise"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 15a: Japanese RER: STL Seas. Adj. vs. Wavelet Tresh. Signal</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 15b: Japanese RER: STL Remainder vs. Wavelet Tresh. Noise</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 15a and 15b :::::::::: -->
Clearly the STL seasonally adjusted series is very similar to the wavelet signal curve. However, this is really only because the cyclical components in the underlying data are negligible. This can be confirmed by looking at the magnitude of the STL seasonality curve. Nevertheless, a close inspection of the STL remainder and wavelet threshold noise series reveals noticeable differences. It is these differences that drive any differences in the STL seasonal adjustment and wavelet threshold signal curves.<br /><br /><br />
<h3 class="seccol", id="sec5">Outlier Detection</h3>
A particularly important and useful application of wavelets is <b>outlier detection</b>. While the subject matter has received some attention over the years starting with Greenblatt (1996), we focus here on a rather simple and appealing contribution by Bilen and Huzurbazar (2002). The appeal of their approach is that it doesn't require model estimation, is not restricted to processes generated via ARIMA, and works in the presence of both additive and innovational outliers. The approach does assume that wavelet coefficients are approximately independent and identically normal variates. This is a rather weak assumption since the independence assumption (the more difficult to satisfy) is typically guaranteed using the DWT. While EVIews offers the ability to perform this procedure using a MODWT, it's generally better suited to the orthonormal transform.<br /><br />
Bilen and Huzurbazar (2002) also suggest that Haar is the preferred filter here. This is because the latter yields coefficients large in magnitude in the presence of jumps or outliers. They also suggest that the transformation be carried out only at the first scale. Nevertheless, EViews does offer the ability to stray away from these suggestions.<br /><br />
The overall procedure works on the principle of thresholding and the authors suggest the use of the universal threshold. The idea here is that extreme (outlying) values will register as noticeable spikes in the spectrum. As such, those values would be candidates for outlying observations. In particular, if $ m_{j} $ denotes the number of wavelet coefficients at scale $ \lambda_{j} $, the entire algorithm is summarized (and generalized) as follows:
<ol>
<li>Apply a wavelet transform to the original data up to some scale $ J \leq M $.</li><br />
<li>Specify a threshold value $ \eta $.</li><br />
<li>For each $ j = 1, \ldots, J $:</li><br />
<ol>
<li>Find the set of indices $ S = \cbrace{s_{1}, \ldots, s_{m_{j}}} $ such that $ |W_{i, j}| > \eta $ for $ i = 1, \ldots, m_{j} $.</li><br />
<li>Find the exact location of the outlier among original observations. For instance, if $ s_{i} $ is an index associated with an outlier:</li><br />
<ul>
<li>
If the wavelet transform is the DWT, the original observation associated with that outlier is either $ 2^{j}s_{i} $ or $ (2^{j}s_{i} - 1) $. To discern between the two, let $ \tilde{\mu} $ denote the mean of the original observations with observations located at $ 2s_{i} $ and $ (2s_{i} - 1) $. That is:
$$ \tilde{\mu} = \frac{1}{T-2}\sum_{t \neq 2^{j}s_{i}\, ,\, (2^{j}s_{i} - 1)}{y_{t}} $$
If $ |y_{2^{j}s_{i}} - \tilde{\mu}| > |y_{2^{j}s_{i} - 1} - \tilde{\mu}| $, the location of the outlier is $ 2^{j}s_{i} $, otherwise, the location of the outlier is $ (2^{j}s_{i} - 1) $.
</li><br />
<li>If the wavelet transform is the MODWT, the outlier is associated with observation $ i $.</li>
</ul>
</ol>
</ol><br />
<h4 class="subseccol", id="sec5.1">Example: Bilen and Huzurbazar (2002) Outlier Detection</h4>
To demonstrate outlier detection, data is obtained from the <b>US Geological Survey</b> website <a href='https://www.usgs.gov/'>https://www.usgs.gov/</a>. As discussed in Bilen and Huzurbazar (2002), data collected in this database comes from many different sources and is generally notorious for input errors. Here we focus on a monthly dataset, collected at irregular intervals from May 19876 to June 2020, measuring water conductance at the Green River near Greendale, UT. The dataset is identified by site number 09234500.<br /><br />
A quick summary of the series indicates that there is a large drop from typical values (500 to 800 units) in September 1999. The value recorded at this date is roughly 7.4 units. This is an unusually large drop and is almost certainly an outlying observation.<br /><br />
In an attempt to identify the aforementioned outlier, and perhaps uncover others, we use aforementioned wavelet outlier detection method. We stick with the defaults suggested in the paper and use a DWT transform with a Haar filter, universal threshold, a mean median absolute deviation estimator for wavelet coefficient variance, and a maximum decomposition scale set to unity.<br /><br />
To proceed, either download the data from the source, or open the tab <b>Outliers</b> in the workfile provided. The series we're interested in is <b class="wfobj">WATER_CONDUCTANCE</b>. Next, open the series window and proceed as follows:
<ol>
<li>Click on <b>View/Wavelet Analysis/Outlier Detection...</b></li>
<li>Set the <b>Max scale</b> dropdown to <b>1</b>.</li>
<li>Under the <b>Threshold</b> group, set the <b>Method</b> dropdown to <b>Hard</b>.</li>
<li>Under the <b>Wavelet coefficient variance</b> group, set the <b>Method</b> dropdown to <b>Mean Med. Abs. Dev.</b>.</li>
<li>Click on <b>OK</b>.</li>
</ol><br />
<!-- :::::::::: FIGURE 16 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/outliers_ex1_1.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/outliers_ex1_1.png" title="Water Conductance: Outlier Detection"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 16: Water Conductance: Outlier Detection</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 16 :::::::::: -->
The output is a spool object with the spool tree listing the summary, outlier table, and outlier graphs for each scale (in this case just one). The first of these is a summary of the outlier detection procedure performed. Next is a table listing the exact location of a detected outlier along with its value and absolute deviation from the series mean and median, respectively. The plot that follows is that of the original series with red dots identifying outlying observations along with a dotted vertical line at said locations for easier identification.<br /><br />
Evidently, the large outlying observation in September 1999 is accurately identified. In addition there are three other possible outlying observations identified in September 1988, January 1992, and June 2020.<br /><br /><br />
<h3 class="seccol", id="sec6">Conclusion</h3>
In this first entry of our series on wavelets, we provided a theoretical overview of the most important aspects in wavelet analysis. Here we demonstrated how these principles are applied to real and artificial data using the new EViews 12 wavelet engine.<br /><br /><br />
<hr />
<h3 class="seccol", id="sec7">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/wavelets/workfiles/wavelets.wf1"'><b class="wf">WAVELETS.WF1</b></a></li>
<li><a href="http://www.eviews.com/blog/wavelets/workfiles/wavelets.prg"'><b class="wf">WAVELETS.PRG</b></a></li>
</ul><br /><br />
<hr />
<h3 class="seccol", id="sec8">References</h3>
<ol class="bib2xhtml">
<li id="bilen-2002" class="entry">
Bilen C and Huzurbazar S (2002), <i>"Wavelet-based detection of outliers in time series"</i>, Journal of Computational and Graphical Statistics. Vol. 11(2), pp. 311-327. Taylor & Francis.
</li>
<li id="greenblatt-1996" class="entry">
Greenblatt SA (1996), <i>"Wavelets in econometrics"</i>, In Computational Economic Systems. , pp. 139-160. Springer.
</li>
<li id="pesaran-2007" class="entry">
Pesaran MH (2007), <i>"A simple panel unit root test in the presence of cross-section dependence"</i>, Journal of applied econometrics. Vol. 22(2), pp. 265-312. Wiley Online Library.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com6tag:blogger.com,1999:blog-6883247404678549489.post-41385365046285526582020-11-30T09:06:00.006-08:002020-12-02T08:09:49.844-08:00Wavelet Analysis: Part I (Theoretical Background)<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
.seccol {
}
.subseccol {
color: #fa5e5e
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
rbrace: ['{\\left(#1\\right)}', 1],
cbrace: ['{\\left\\{#1\\right\\}}', 1],
sbrace: ['{\\left[#1\\right]}', 1],
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1],
series: ['{\\left\\{#1_{#2}\\right\\}_{#2=#3}^{#4}}', 4],
xsum: ['{\\sum_{#1=#2}^{#3}{#4}}', 4],
var: ['{\\operatorname\{var\}}'],
sign: ['{\\operatorname\{sign\}}'],
diag: ['{\\operatorname\{diag\}}'],
med: ['{\\operatorname\{median\}}'],
vec: ['{\\operatorname\{vec\}}'],
tr: ['{\\operatorname\{tr\}}']
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
This is the first of two entries devoted to wavelets. Here, we summarize the most important theoretical principles underlying wavelet analysis. This entry should serve as a detailed background reference when using the new wavelet features released in EViews 12. In <a href='http://blog.eviews.com/2020/12/wavelet-analysis-part-ii-applications.html'>Part II</a> we will apply these principles and demonstrate how they are used with the new EViews 12 wavelet
engine.
<a name='more'></a><br /><br />
<h3 class="seccol">Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction to Wavelets</a>
<li><a href="#sec2">Wavelet Transforms</a>
<ul>
<li><a href="#sec2.1">Discrete Wavelet Filters</a>
<li><a href="#sec2.2">Mallat's Pyramid Algorithm</a>
<li><a href="#sec2.3">Boundary Conditions</a>
<li><a href="#sec2.4">Variance Decomposition</a>
<li><a href="#sec2.5">Multiresolution Analysis</a>
</ul>
<li><a href="#sec3">Practical Considerations</a>
<ul>
<li><a href="#sec3.1">Choice of Wavelet Filter</a>
<li><a href="#sec3.2">Handling Boundary Conditions</a>
<li><a href="#sec3.3">Adjusting Non-Dyadic Time Series Lengths</a>
</ul>
<li><a href="#sec4">Wavelet Thresholding</a>
<ul>
<li><a href="#sec4.1">Thresholding Rule</a>
<li><a href="#sec4.2">Optimal Threshold</a>
<li><a href="#sec4.3">Wavelet Coefficient Variance</a>
<li><a href="#sec4.4">Thresholding Implementation</a>
</ul>
<li><a href="#sec5">Conclusion</a>
<li><a href="#sec6">References</a>
</ol><br />
<h3 class="seccol", id="sec1">Introduction to Wavelets</h3>
What characterizes most economic time series are time-varying features such as non-stationarity, volatility, seasonality, and structural discontinuities. Wavelet analysis is a natural framework for analyzing these phenomena without imposing any simplifying assumptions such as stationarity. In particular, wavelet filters can decompose and reconstruct a time series (as well as its correlation structure) across timescales so that constituent elements at one scale are uncorrelated with those at another. This is clearly useful in isolating features which materialize only at certain timescales.<br /><br />
Wavelet analysis is also, in many respects, like Fourier spectral analysis. Both methods can represent a time series signal in a different space by re-expressing a signal as a linear combination of basis functions. In the context of Fourier analysis, these basis functions are sines and cosines. While these basis functions approximate global variation well, they are poorly adapted to capturing local variation, otherwise known as time-variation in time series analysis. To see this, observe that trigonometric basis functions are sinusoids of the form:
$$ R\cos\left(2\pi(\omega t + \phi)\right) $$
where $ R $ is the <b>amplitude</b>, $ \omega $ is the <b>frequency</b> (in cycles per unit time) or <b>period</b> $ \frac{1}{\omega} $ (in units of time), and $ \phi $ is the <b>phase</b>. Accordingly, if the time variable $ t $ is shifted and scaled to $ u = \frac{t - a}{b} $, the associated sinusoid becomes:
$$ R\cos\left(2\pi(\omega^{\star} u + \phi^{\star})\right) $$
where $ \omega^{\star} = \omega b $ and $ \phi^{\star} = \phi + \omega a $.<br /><br />
Evidently, the amplitude $ R $ is invariant to shifts in location and scale. Furthermore, notice that if $ b > 1 $, the frequency $ \omega^{\star} $ increases, but time $ u $ decreases, and vice versa. Accordingly, frequency information is gained when time information is lost, and vice versa.<br /><br />
Ultimately, trigonometric functions are ideally adapted to stationary processes characterized by impulses which wane with time, but are otherwise poorly adapted to discontinuous, non-linear, and non-stationary processes whose impulses persist and evolve with time. To surmount this fixed time-frequency relationship, a new set of basis functions are needed.<br /><br />
In contrast to Fourier transforms, wavelet transforms rely on a reference basis function called the <b>mother wavelet</b>. The latter is stretched (scaled) and shifted across time to capture time-dependent features. Thus, the wavelet basis functions are localized both in scale and time. In this sense, the wavelet basis function scale is the analogue of frequency in Fourier transforms. The fact that the wavelet basis function is also shifted (translated) across time, implies that wavelet basis functions are similar in spirit to performing a Fourier transform on a moving and overlapping window of subsets of the entire time series signal.<br /><br />
In particular, the mother wavelet function $ \psi(t) $ is any function satisfying:
$$ \int_{-\infty}^{\infty} \psi(x) dx = 0 \qquad\qquad \int_{-\infty}^{\infty} \psi(x)^{2} dx = 1 $$
In other words, wavelets are functions that have mean zero and unit energy. Here, the term <i>energy</i> originates from the signal processing literature and is formalized as $ \int_{-\infty}^{\infty} |f(t)^{2}| dt$ for some function $ f(t) $. In fact, the concept is interchangeable with the idea of <b>variance</b> for non-complex functions.<br /><br />
From the mother wavelet, the wavelet basis functions are now derived as:
$$ \psi_{a,b}(t) = \frac{1}{\sqrt{b}}\psi\left(\frac{t - a}{b}\right) $$
where $ a $ is the <b>location constant</b>, whereas $ b $ is the <b>scaling factor</b> which corresponds to the notion of frequency in Fourier analysis. Observe further that the analogue of the amplitude $ R $ in Fourier analysis, here captured by the term $ \frac{1}{\sqrt{b}} $, is in fact a function of the scale $ b $. Accordingly, wavelet basis functions will adapt to scale-dependent phenomena much better than their trigonometric counterparts.<br /><br />
Since wavelet basis functions are <i>de facto</i> location and scale transformations of a single function, they are also an ideal tool for <b>multiresolution analysis</b> (MRA) - the ability to analyze a signal at different frequencies with varying resolutions. In fact, MRA is in some sense the inverse of the wavelet transform. It can derive representations of the original time-series data, using only those features which are characteristic at a given timescale. For instance, a highly noisy but persistent time series, can be decomposed into a portion which represents only the noise (features captured at high frequency), and a portion which represents only the persistent signal (features captured at low frequencies). Thus, moving along the time domain, MRA allows one to zoom to a desired level of detail such that high (low) frequencies yield good (poor) time resolutions and poor (good) frequency resolutions. Since economic time series often exhibit multiscale features, wavelet techniques can effectively decompose these series into constituent processes associated with different timescales.<br /><br /><br /><br />
<h3 class="seccol", id="sec2">Wavelet Transforms</h3>
In the context of continuous functions, the <b>continuous wavelet transform</b> (CWT) of a time series $ y(t) $ is defined as:
$$ W(a, b) = \int_{-\infty}^{\infty} y(t)\psi_{a,b}(t) \,dt $$
Moreover, the inverse transformation to reconstruct the original process is given as:
$$ y(t) = \int_{-\infty}^{\infty} \int_{0}^{\infty} W(a,b)\psi_{a,b}(t) \,da \,db $$
See Percival and Walden (2000) for a detailed discussion.<br /><br />
Since continuous functions are rarely observed, the CWT is empirically rarely exploited and a discretized analogue known as the <b>discrete wavelet transform</b> (DWT) is used. In its most basic form, the series length, $ T = 2^{M} $ for $ M \geq 0 $, is assumed <b>dyadic</b> (a power of 2), and the DWT manifests as a collection of CWT <i>slices</i> at nodes $ (a, b) \equiv (a_{k}, b_{j}) $ such that $ a_{k} = 2^{j}k $ and $ b_{j} = 2^{j} $ where $ j = 1, \ldots, M $. In other words, the discrete wavelet basis functions assume the form:
$$ \psi_{k,j}(t) = 2^{-j/2}\psi\left( 2^{-j}t - k \right) $$
Unlike the CWT which is highly redundant in both location and scale, the DWT can be designed as an orthonormal transformation. If the location discretization is restricted to the index $ k = 1, \ldots, 2^{-j}T $, at each scale $ \lambda_{j} = 2^{j - 1} $, half the available observations are lost in exchange for <b>orthonormality</b>. This is the classical DWT framework. Alternatively, if the location index is restricted to the full set of available observations with $ k = 1, \ldots, T $, the discretized transform is no longer orthonormal, but does not suffer from observation loss. The latter framework is typically referred to as the <b>maximal overlap discrete wavelet transform</b> (MODWT), and sometimes as the <b>non-decimated</b> DWT. Since the DWT is formally characterized by wavelet filters, we devote some time to those next.<br /><br />
<h4 class="subseccol", id="sec2.1">Discrete Wavelet Filters</h4>
Formally, the DWT is characterized via $h = \rbrace{h_{0}, \ldots, h_{L-1}}$ and $g = \rbrace{ g_{0}, \ldots, g_{L-1} }$ -- the wavelet (high pass) and scaling (low pass) filters of length $L$, respectively, for some $ L \geq 1 $. Recall that the low and high pass filters are defined in the context of <b>frequency response functions</b>, otherwise known as <b>transfer functions</b>. The latter are Fourier transforms of impulse response functions. Since the impulse response function describes, in the time domain, the evolution (response) of a time series signal to a given stimulus (impulse), the transfer function describes, in the frequency domain, the response of a time series signal to a given impulse in the frequency domain. In this regard, when the magnitude of the transfer function, otherwise known as the <b>gain function</b>, is large at low frequencies and small at high frequencies, the filter associated with that transfer function is said to be a <b>low-pass filter</b>. Otherwise, when the gain function is small at low frequencies but high at high frequencies, the transfer function is associated with a <b>high-pass</b> filter.<br /><br />
Like traditional time series filters which are used to extract features (eg. trends, seasonalities, business cycles, noise, etc.), wavelets filters perform a similar role. They are designed to capture low and high frequencies, and have a particular length. This length governs how much of the original series information is used to extract low and high frequency phenomena. This is very similar to the role of the autoregressive (AR) order in traditional time series models where higher AR orders imply more historical observations influence the present.<br /><br />
The simplest and shortest wavelet filter is of length $ L = 2 $ and is called the <b>Haar</b> wavelet. Formally, it is characterized by its high-pass filter definition:
\begin{align*}
h_{l} =
\begin{cases}
\frac{1}{\sqrt{2}} \quad \text{if} \quad l = 0\\
\frac{-1}{\sqrt{2}} \quad \text{if} \quad l = 1
\end{cases}
\end{align*}
This is a sequence of rescaled rectangular functions and is therefore ideally suited to analyzing signals with sudden and discontinuous changes. In this regard, it is ideally suited for outlier detection. Unfortunately, this filter is typically too simple for most other applications.<br /><br />
To help mitigate the limitations of the Haar filter, Daubechies (1992) introduced a family of filters (known as <b>daublets</b>) of even length that are indexed by the polynomial degree they are able to capture -- rather the number of vanishing moments. Thus, the Haar filter, which is of length 2, can only capture constants and linear functions. The Daubechies wavelet filter of length 4 can capture everything from a constant to a cubic function, and so on. Accordingly, higher filter lengths are associated with higher smoothness. Unlike the Haar filter which has a closed form solution in the time domain, the Daubechies family of wavelet filters have a closed form solution only in the frequency domain.<br /><br />
Unfortunately, Daubechies filters are typically not symmetric. If a more symmetric version of the daublet filters is required, then the class known as <b>least asymmetric</b>, or <b>symmlets</b>, is used. The latter define a family of wavelet filters which are as close to symmetric as possible.<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/wavelet_haar.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/wavelet_haar.png" title="Haar Wavelet"
width="540" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: Haar Wavelet</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/wavelet_d8.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/wavelet_d8.png" title="Daublet (L=8) Wavelet"
width="540" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: Daubechies - Daublet (L=8) Wavelet</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/wavelets/images/wavelet_la8.png"><img height="auto"
src="http://www.eviews.com/blog/wavelets/images/wavelet_la8.png" title="Symmlet (L=8) Wavelet"
width="540" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: Least Asymmetric - Symmlet (L=8) Wavelet</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<h4 class="subseccol", id="sec2.2">Mallat's Pyramid Algorithm</h4>
In practice, DWT coefficients are derived through the <b>pyramid algorithm</b> of Mallat (1989). In case of the classical DWT with $T=2^{M}$, let $\mathbf{y} = \series{y}{t}{1}{T}$ and define $\mathbf{W} = \sbrace{\mathbf{W}_{1}, \ldots, \mathbf{W}_{M}, \mathbf{V}_{M}}^{\top}$ as the matrix of DWT coefficients. Here, $\mathbf{W}_{j}$ is a vector of wavelet coefficients of length $T/2^{j}$ and is associated with changes on a scale of length $\lambda_{j} = 2^{j-1}$. Moreover, $\mathbf{V}_{M}$ is a vector of scaling coefficients of length $T/2^{j}$ and is associated with averages on a scale of length $\lambda_{M} = 2^{M-1}$. $\mathbf{W}$ now follows from $\mathbf{W} = \mathcal{W}\mathbf{y}$ where $\mathcal{W}$ is some $T\times T$ orthonormal matrix generating the DWT coefficients. The algorithm can now be formalized as follows.<br /><br />
If $\mathbf{W}_{j} = \rbrace{W_{1,1} \ldots W_{T/2^{j},j}}^{\top}$ and $\mathbf{V}_{j} = \rbrace{V_{1,1} \ldots V_{T/2^{j},j}}^{\top}$, the $j^{th}$ iteration of the algorithm convolves an input signal with filters $h$ and $g$ respectively to derive the $j^{th}$ level DWT matrix $\sbrace{\mathbf{W}_{1}, \ldots \mathbf{W}_{j}, \mathbf{V}_{j}}^{\top}$. Explicitly, the convolution is formalized as:
\begin{align*}
W_{t,1} &= \xsum{l}{0}{L-1}{h_{l}y_{2t-l\hspace{-5pt}\mod T}} && V_{t,1} = \xsum{l}{0}{L-1}{g_{l} y_{2t-l\hspace{-5pt}\mod T}} && j=1\\
W_{t,j} &= \xsum{l}{0}{L-1}{h_{l} V_{2t-l\hspace{-5pt}\mod T,j-1}} && V_{t,j} = \xsum{l}{0}{L-1}{g_{l} V_{2t-l\hspace{-5pt}\mod T,j-1}} && j=2,\ldots,M
\end{align*}
where $t=1,\ldots,T/2^{j}$.
In particular, each iteration therefore convolves the scaling coefficients from the preceding iteration, namely $V_{t,j-1}$, with both the high and low pass filters, and the input signal in the first iteration is $y_{t}$. The entire algorithm continues until the $M^{th}$ iteration although it can be stopped earlier.<br /><br />
In effect, at each scale, the DWT algorithm partitions the frequency spectrum into equal subsets -- the low and high frequencies. At the first scale, low-frequency phenomena of the original signal $ \mathbf{y} $ are captured by $ \mathbf{V}_{1} $, whereas high frequency phenomena are captured by $ \mathbf{W}_{1} $. At scale 2, the same procedure is performed not on the original time series signal, but on the low-frequency components $ \mathbf{V}_{1} $. This in turn generates $ \mathbf{V}_{2} $, which is in a sense those phenomena that would be captured in the first quarter of the frequency spectrum, as well as $ \mathbf{W}_{2} $ -- the high-frequency components at scale 2, or those phenomena that would be captured in the second quarter of the frequency range. This continues at finer and finer levels as we increase scale. In this regard, increasing scale can isolate increasingly more persistent (lower frequency) features of the original time-series signal, with the wavelet coefficients $ \mathbf{W}_{j} $ capturing the remaining, cumulated, ``noisy'' features.<br /><br />
<h4 class="subseccol", id="sec2.3">Boundary Conditions</h4>
It's important to note that both the DWT and the MODWT make use of <b>circular filtering</b>. When a filtering operation reaches the beginning or end of an input series, otherwise known as the <b>boundaries</b>, the filter treats the input time series as periodic with period $ T $. In other words, we assume that $ y_{T-1}, y_{T-2}, \ldots $ are useful surrogates for unobserved values $ y_{-1}, y_{-2}, \ldots $. Those wavelet coefficients which are affected are also known as <b>boundary coefficients</b>. Note that the number of boundary coefficients only depends on the filter length $ L $ and is independent of the input series length $ T $. Furthermore, the number of boundary coefficients increases with filter length $ L $. In particular, the formula for the number of boundary coefficients for the DWT and MODWT respectively, are given by:
\begin{align*}
\kappa_{\text{DWT}, j} &\equiv L_{j}^{\prime}\\
\kappa_{\text{MODWT}, j} &\equiv \min \cbrace{L_{j}, T}
\end{align*}
where $ L_{j}^{\prime} = \left\lceil (L - 2)\rbrace{1 - \frac{1}{2^{j}}} \right\rceil $ and $ L_{j} = (L - 1)(2^{j - 1} - 1) $.<br /><br />
Furthermore, both DWT and MODWT boundary coefficients will appear at the beginning of $ \mathbf{W}_{j} $ and $ \mathbf{V}_{j} $. Refer to Percival and Walden (2000) for further details.<br /><br />
<h4 class="subseccol", id="sec2.3">Variance Decomposition</h4>
The orthonormality of the DWT generating matrix $\mathcal{W}$ has important implications. First, $\mathcal{W}\times\mathcal{W} = I_{T}$, is an identity matrix of dimension $T$. More importantly, $\norm{\mathbf{y}}^{2} = \norm{\mathbf{W}}^{2}$. To see this, recall that $\mathbf{y} = \mathcal{W}^{\top}\mathbf{W}$ and $\norm{\mathbf{y}}^{2} = \mathbf{y}^{\top}\mathbf{y}$. The DWT is therefore an energy (variance) preserving transformation. Coupled with this preservation of energy is also the decomposition of energy on a scale by scale basis. The latter formalizes as:
\begin{align}
\norm{\mathbf{y}}^{2} = \xsum{j}{1}{M}{\norm{\mathbf{W}_{j}}^{2}} + \norm{\mathbf{V}_{M}}^{2} \label{eq2.5.1}
\end{align}
where $\norm{\mathbf{W}_{j}}^{2} = \xsum{t}{t}{T/2^{j}}{W^{2}_{t,j}}$ and $\norm{\mathbf{V}_{M}}^{2} = \xsum{t}{t}{T/2^{M}}{V^{2}_{t,M}}$. Thus, $\norm{\mathbf{W}_{j}}^{2}$ quantifies the energy of $ y_{t} $ accounted for at scale $\lambda_{j}$. This decomposition is known as the <b>wavelet power spectrum</b> (WPS) and is arguably the most insightful of the properties of the DWT.<br /><br />
The WPS bares resemblance to the <b>spectral density function</b> (SDF) used in Fourier analysis. Whereas the SDF decomposes the variance of an input series across frequencies, in wavelet analysis, the variance of an input series is decomposed across scales $ \lambda_{j} $. One of the advantages of the WPS over the SDF is that the latter requires an estimate of the input series mean, whereas the former does not. In particular, note that the total variance in $ \mathbf{y} $ can be decomposed as:
$$ \xsum{j}{0}{\infty}{\nu^{2}(\lambda_{j})} = \var(\mathbf{y}) $$
where $ \nu^{2}(\lambda_{j}) $ is the contribution to $ \var(\mathbf{y}) $ due to scale $ \lambda_{j} $ and is estimated as:
$$ \hat{\nu}^{2}(\lambda_{j}) \equiv \frac{1}{T} \xsum{t}{1}{T}{W_{t,j}^{2}} $$
Note that $ \hat{\nu}^{2}(\lambda_{j}) $ is the energy of $ y_{t} $ at scale $ \lambda_{j} $ divided by the number of observations. Unfortunately, this estimator is biased due to the presence of boundary coefficients. To derive an unbiased estimate, boundary coefficients should be dropped from consideration. Accordingly, an unbiased estimate of variance contributed at scale $ \lambda_{j} $ is given by:
$$ \tilde{\nu}^{2}(\lambda_{j}) \equiv \frac{1}{M_{j}} \xsum{t}{\kappa_{j} + 1}{T}{W_{t,j}^{2}}$$
where $ M_{j} = T - \kappa_{j}$ and $ \kappa_{j} \equiv L_{j}^{\prime} $ when wavelet coefficients are derived using the DWT, whereas $ \kappa_{j} \equiv L_{j} $ in case wavelet coefficients derive from the MODWT.<br /><br />
It is also possible to derive confidence intervals for the contribution to the overall variance at each scale. In particular, dealing with unbiased estimators $ \tilde{\nu}(\lambda_{j}) $ and a level of significance $ \alpha \in (0,1) $, a confidence interval for $ \nu(\lambda_{j}) $ with coverage $ 1 - 2\alpha $ is given by:
\begin{align*}
\sbrace{\tilde{\nu}^{2}(\lambda_{j}) - \Phi^{-1}(1 - \alpha) \rbrace{\frac{2A_{j}}{M_{j}}}^{1/2} \quad ,\quad \tilde{\nu}^{2}(\lambda_{j}) + \Phi^{-1}(1 - \alpha) \rbrace{\frac{2A_{j}}{M_{j}}}^{1/2}}
\end{align*}
Above, $ A_{j} $ is the integral of the squared spectral density function of wavelet coefficients $ \mathbf{W_{j}} $ excluding any boundary coefficients. As shown in Percival and Walden (2000), $ A_{j} $ can be estimated as the sum of squared serial correlations among $ \mathbf{W_{j}} $ excluding any boundary coefficients. In other words:
$$ \hat{A}_{j} = \frac{1}{M_{j}}\xsum{t}{\kappa_{j}}{T - |\tau|}{W_{j, t}W_{j, t+ |\tau|}} \, \quad 0 \leq |\tau| \leq M_{j} - 1 $$
Unfortunately, as argued in Priestley (1981), there is no condition that prevents the lower bound of the confidence interval above from becoming negative. Accordingly, Percival and Walden (2000) suggest the approximation:
$$ \frac{\eta \tilde{\nu}^{2}(\lambda_{j})}{\nu^{2}(\lambda_{j})} \stackrel{d}{=} \chi^{2}_{\eta} $$
where $ \eta $ is known as the <b>equivalent degrees of freedom</b> (EDOF) and is formalized as:
$$ \eta = \frac{2 E\rbrace{\tilde{\nu}^{2}(\lambda_{j})}^{2}}{\var \rbrace{\tilde{\nu}^{2}(\lambda_{j})}} $$
The confidence interval of interest with coverage $ 1 - 2\alpha $ can now be stated as:
\begin{align*}
\sbrace{\frac{\eta \tilde{\nu}^{2}(\lambda_{j})}{Q_{\eta}(1 - \alpha)} \,,\, \frac{\eta \tilde{\nu}^{2}(\lambda_{j})}{Q_{\eta}(\alpha)}}
\end{align*}
where $ Q_{\eta}(1 - \alpha) $ is the $ \alpha- $ quantile for the $ \chi^{2}_{\eta} $ distribution.<br /><br />
Remaining is the issue of EDOF estimation. Two suggestions in Percival and Walden (2000):
\begin{align*}
\eta_{1} \equiv \frac{M_{j}\tilde{\nu}^{4}(\lambda_{j})}{\hat{A}_{j}}\\
\eta_{2} \equiv \max \cbrace{2^{-j}M_{j} \, , \, 1}
\end{align*}
The first estimate above relies on large sample theory and in practice requires a sample of at least $ T = 128 $ to yield a decent approximation. The second assumes that the SDF of the wavelet coefficients at scale $ \lambda_{j} $ is a band-pass. See Percival and Walden (2000) for details.<br /><br />
<h4 class="subseccol", id="sec2.4">Multiresolution Analysis</h4>
Similar to Fourier, spline, and linear approximations, a principal feature of the DWT is the ability to approximate an input series as a function of wavelet basis functions. In wavelet theory this is known as <b>multiresolution analysis</b> (MRA) and refers to the approximation of an input series at each scale (and up to all scales) $ \lambda_{j} $.<br /><br />
To formalize matters, recall that $ \mathbf{W} = \mathcal{W}\mathbf{y} $ and partition the rows of $ \mathcal{W} $ commensurate with the row partition of $ \mathbf{W} $ into $ \mathbf{W}_{1}, \ldots, \mathbf{W}_{M} $ and $ \mathbf{V}_{M} $. In other words, let $ \mathcal{W} = \sbrace{\mathcal{W}_{1}, \ldots, \mathcal{W}_{M}, \mathcal{V}_{M}}^{\top} $, where $ \mathcal{W}_{j} $ and $ \mathcal{V}_{j} $ have dimensions $ 2^{-j}T \times T $. Then, note that for any $ m \in \cbrace{1, \ldots, M} $:
\begin{align*}
\mathbf{y} &= \mathcal{W}^{\top}\mathbf{W}\\
&= \xsum{j}{1}{m}{\mathcal{W}^{\top}\mathbf{W}_{j}} + \mathcal{V}^{\top}\mathbf{V}_{m}\\
&= \xsum{j}{1}{m}{\mathcal{D}_{j}} + \mathcal{S}_{m}
\end{align*}
where $ \mathcal{D}_{j} = \mathcal{W}^{\top}_{j} \mathbf{W}_{j} $ and $ \mathcal{V}_{m} = \mathcal{V}^{\top}_{m} \mathbf{V}_{m} $ are $ T- $ dimensional vectors, respectively called the $ j^{\text{th}} $ level <b>detail</b> and $ m^{\text{th}} $ level <b>smooth</b> series. Furthermore, since the low-pass (high-pass) wavelet coefficients are associated with changes (averages) at scale $ \lambda_{j} $, the detail and smooth series are associated with changes and average at scale $ \lambda_{j} $, respectively, in the input series $ \mathbf{y} $.<br /><br />
The MRA is typically used to derive approximations for the original series using its lower and upper frequency components. Since upper frequency components are associated with transient features and are captured by the wavelet coefficients, the detail series will in fact extract those features of the original series which are typically associated with ``noise''. Alternatively, since lower frequency components are associated with perpetual features and are captured by the scaling coefficients, the smooth series will in fact extract those features of the original series which are typically associated with the ``signal''.<br /><br />
It's worth noting that because wavelet filtering can result in boundary coefficients, the detail and smooth series will have observations affected by the same. The latter are given as:
\begin{align*}
\text{DWT} &\quad
t =
\begin{cases}
1, \ldots, 2^{j}L_{j}^{\prime} &\quad \text{lower portion}\\
T - \rbrace{L_{j} + 1 - 2^{j}} + 1, \ldots, T &\quad \text{upper portion}
\end{cases}\\
\\
\text{MODWT} &\quad
t =
\begin{cases}
1, \ldots, L_{j} &\quad \text{lower portion}\\
T - L_{j} + 1, \ldots, T &\quad \text{upper portion}
\end{cases}
\end{align*}
<br /><br />
<h3 class="seccol", id="sec3">Practical Considerations</h3>
The exposition above introduces basic theory underlying wavelet analysis. Nevertheless, there are several practical (empirical) considerations which should be addressed. We focus here on three in particular:
<ul>
<li>Wavelet filter selection</li>
<li>Handling boundary conditions</li>
<li>Non-dyadic series length adjustments</li>
</ul><br />
<h4 class="subseccol", id="sec3.1">Choice of Wavelet Filter</h4>
The type of wavelet filter is typically chosen to mimic the data to which it is applied. Shorter filters don't approximate the ideal band pass filter well, but longer ones do. On the other hand, if the data derives from piecewise constant functions, the Haar wavelet or other shorter wavelets may be more appropriate. Alternatively, if the underlying data is smooth, longer filters may be more appropriate. In this regard, it's important to note that longer filters expose more coefficients to boundary condition effects than shorter ones. Accordingly, the rule of thumb strategy is to use the filter with the smallest length that gives reasonable results. Furthermore, since the MODWT is not orthogonal and its wavelet coefficients are correlated, wavelet filter choice is not as vital as in the case of the orthogonal DWT. Nevertheless, if alignment to time is important (i.e. zero phase filters), the least asymmetric family of filters may be a good choice.<br /><br />
<h4 class="subseccol", id="sec3.2">Handling Boundary Conditions</h4>
As previously mentioned, wavelet filters exhibit boundary conditions due to circular recycling of observations. Although this may be an appropriate assumption for some series such as those naturally exhibiting cyclical effects, it is not appropriate in all circumstances. In this regard, another popular approach is to reflect the original series to generate a series of length $ 2T $. In other words, wavelet filtering proceeds on observations $ y_{1}, \ldots, y_{T}, y_{T}, y_{T-1}, \ldots, y_{1} $. In either case, any proper wavelet analysis ought, at the very least, quantify how many wavelet coefficients are affected by boundary conditions.<br /><br />
<h4 class="subseccol", id="sec3.3">Adjusting Non-dyadic Length Time Series</h4>
Recall that the DWT requires an input series of dyadic length. Naturally, this condition is rarely satisfied in practice. In this regard, there are two broad strategies. Either shorten the input series to dyadic length at the expense of losing observations, or ``pad'' the input series with observations to achieve dyadic length. In the context of the latter strategy, although the choice of padding values is ultimately arbitrary, there are three popular choices, neither of which has proven superior:
<ul>
<li>Pad with zeros</li>
<li>Pad with mean</li>
<li>Pad with median</li>
</ul><br /><br />
<h3 class="seccol", id="sec4">Wavelet Thresholding</h3>
A key objective in any empirical work is to discriminate noise from useful information. In this regard, suppose that the observed time series $ y_{t} = x_{t} + \epsilon_{t} $ where $ x_{t} $ is an unknown signal of interest obscured by the presence of unwanted noise $ \epsilon_{t} $. Traditionally, signal discernment was typically achieved using discrete Fourier transforms. Naturally, this assumes that any signal is an infinite superposition of sinusoidal functions; a strong assumption in empirical econometrics where most data exhibits unit roots, jumps, kinks, and various other non-linearities.<br /><br />
The principle behind wavelet-based signal extraction, otherwise known as <b>wavelet shrinkage</b>, is to <i>shrink</i> any wavelet coefficients not exceeding some <b>threshold</b> to zero and then exploit the MRA to synthesize the signal of interest using the modified wavelet coefficients. In other words, only those wavelet coefficients associated with very pronounced spectra are retained with the additional benefit of deriving a very sparse wavelet matrix.<br /><br />
To formalize the idea, let $ \mathbf{x} = \series{x}{t}{1}{T} $ and $ \mathbf{\epsilon} = \series{\epsilon}{t}{1}{T} $. Next, recall that the DWT can be represented as $ T\times T $ orthonormal matrix $ \mathcal{W} $, yielding:
$$ \mathbf{z} \equiv \mathcal{W}\mathbf{y} = \mathcal{W}\mathbf{x} + \mathcal{W}\mathbf{\epsilon} $$
where $ \mathcal{W}\mathbf{\epsilon} \sim N(0, \sigma^{2}_{\epsilon}) $. The idea now is to shrink any coefficients not surpassing a threshold to zero.<br /><br />
<h4 class="subseccol", id="sec4.1">Thresholding Rule</h4>
While there are several thresholding rules, by far, the two most popular are:
<ul>
<li><b>Hard Tresholding Rule</b> (``kill/keep'' strategy), formalized as:
$$
\delta_{\eta}^{H}(x) =
\begin{cases}
x \quad \text{if } |x| > \eta\\
0 \quad \text{otherwise}
\end{cases}
$$
</li>
<li>
<b>Soft Thresholding Rule</b>, formalized as:
$$ \delta_{\eta}^{S}(x) = \sign(x)\max\cbrace{0 \,,\, |x| - \eta} $$
</li>
</ul>
where $ \eta $ is the threshold limit.<br /><br />
<h4 class="subseccol", id="sec4.2">Optimal Threshold</h4>
The threshold value $ \eta $ is key to wavelet shrinkage. In particular, optimal thresholding is achieved when $ \eta = \sigma_{\epsilon} $ where $ \sigma_{\epsilon} $ is the standard deviation of the noise process $ \mathbf{\epsilon} $. In this regard, several threshold strategies have emerged over the years.
<ul>
<li>
<b>Universal Threshold</b>, proposed in Donoho and Johnstone (1994), and formalized as:
$$ \eta^{\text{U}} = \hat{\sigma}_{\epsilon} \sqrt{2\log(T)} $$
where $ \hat{\sigma}_{\epsilon} $ is estimated using wavelet coefficients only at scale $ \lambda_{1} $, regardless of what scale is under consideration. When this threshold rule is coupled with soft thresholding, the combination is commonly referred to as <b>VisuShrink</b>.<br /><br />
</li>
<li>
<b>Adaptive Universal Threshold</b> is identical to the universal threshold above, but estimates $ \hat{\sigma}_{\epsilon} $ using those wavelet coefficients associated with the scale under consideration. In other words:
$$ \eta^{\text{AU}} = \hat{\sigma}_{\epsilon, j} \sqrt{2\log(T)} $$
where $ \sigma_{\epsilon, j} $ is the variance of the wavelet coefficients at scale $ \lambda_{j} $.<br /><br />
</li>
<li>
<b>Minimax Estimation</b> proposed in Donoho and Johnstone (1994), and is formalized as the solution to:
$$ \inf_{\hat{\mathbf{x}}}\sup_{\mathbf{x}} R(\hat{\mathbf{x}}, \mathbf{x}) $$
Unfortunately, a closed form solution is not available, although tabulated values exist. Furthermore, when this threshold is coupled with soft thresholding, the combination is commonly referred to as <b>RiskShrink</b>.<br /><br />
</li>
<li>
<b>Stein's Unbiased Risk Estimate</b> (SURE), formalized as the solution to:
$$ \min_{\hat{\mathbf{\mu}}} \norm{\mathbf{\mu} - \hat{\mathbf{\mu}}}^{2} $$
where $ \mathbf{\mu} = (\mu_{1}, \ldots, \mu_{s})^{\top} $ and $ \mu_{k} $ is the mean of some variable of interest $ q_{k} ~ N(\mu_{k}, 1) $, for $ k = 1, \ldots, s $. In the framework of wavelet coefficients, $ q_{k} $ would represent the standardized wavelet coefficients at a given scale.<br /><br />
Furthermore, while the optimal threshold $ \eta $ based on this rule depends on the thresholding rule used, the solution may not be unique and so the SURE threshold value is the minimum such $ \eta $. In case of the soft thresholding rule, the solution was proposed in Donoho and Johnstone (1994). Alternatively, for the hard thresholding rule, the solution was proposed in Jansen (2010).<br /><br />
</li>
<li>
<b>False Discovery Rate</b> (FDR), proposed in Abramovich and Benjamini (1995), determines the threshold value through a multiple hypotheses testing problem. The procedure is summarized in the following algorithm:<br /><br />
<ol>
<li>
For each $ W_{t,j} \in \mathbf{W}_{j} $ consider the hypothesis $ H_{t,j}: W_{t,j} = 0 $ and its associated two-sided $ p- $value:
$$ p_{t,j} = 2\rbrace{1 - \Phi\rbrace{\frac{|W_{t,j}|}{\sigma_{\epsilon, j}}}} $$
where as before, $ \sigma_{\epsilon, j} $ is the variance of the wavelet coefficients at scale $ \lambda_{j} $ and $ \Phi(\cdot) $ is the standard Gaussian CDF.<br /><br />
</li>
<li>
Sort the $ p_{t,j} $ in ascending order so that:
$$ p_{(1)} \leq p_{(2)} \leq \ldots \leq p_{(m_{j})} $$
where $ m_{j} $ denotes the cardinality (number of elements) in $ \mathbf{W}_{j} $. For instance, when $ \mathbf{W}_{j} $ are derived from a DWT, then $ m_{j} = T/2^{j} $.<br /><br />
</li>
<li>
Let $ \alpha $ define the significance level of the hypothesis tests and let $ i^{\star} $ denote the largest $ i \in \cbrace{1, \ldots, m_{j}} $ such that $ p_{(i)} \leq (\frac{i}{m_{j}})\alpha $. For this $ i^{\star} $, the quantity:
$$ \eta^{\text{FDR}}_{j} = \sigma_{\epsilon, j}\Phi^{-1}\rbrace{1 - \frac{p_{i^{\star}}}{2}} $$
is the optimal threshold for wavelet coefficients at scale $ \lambda_{j} $.<br />
</li>
</ol>
</li>
</ul>
For further details, Donoho, Johnstone, et. al. (1998), Gencay, Selcu, and Whitcher (2001), and Percival and Walden (2000).<br /><br />
<h4 class="subseccol", id="sec4.3">Wavelet Coefficient Variance</h4>
Before summarizing the entire threshold procedure, there remains the issue of how to estimate the variance of the wavelet coefficients, $ \sigma^{2}_{\epsilon} $. If the assumption is that the observed data $ \mathbf{y} $ is obscured by some noise process $ \mathbf{\epsilon} $, the usual estimator of variance will exhibit extreme sensitivity to noisy observations. Accordingly, let $ \mu_{j} $ and $ \zeta_{j} $ denote the mean and median, respectively, of the wavelet coefficients $ \mathbf{W}_{j} $ at scale $ \lambda_{j} $, and let $ m_{j} $ denote its cardinality (total number of coefficients at said scale). Then, several common estimators have been proposed in the literature:
<ul>
<li>
<b>Mean Absolute Deviation</b> formalized as:
$$ \hat{\sigma}_{\epsilon, j} = \frac{1}{m_{j}}\xsum{i}{1}{m_{j}}{|W_{i, j} -\mu_{j}|} $$<br /><br />
</li>
<li>
<b>Median Absolute Deviation</b> formalized as:
$$ \hat{\sigma}_{\epsilon, j} = \med\rbrace{|W_{1, j} -\zeta_{1}|, \ldots, |W_{m_{j}, j} -\zeta_{m_{j}}|} $$<br /><br />
</li>
<li>
<b>Mean Median Absolute Deviation</b> formalized as:
$$ \hat{\sigma}_{\epsilon, j} = \frac{1}{m_{j}}\xsum{i}{1}{m_{j}}{|W_{i, j} -\zeta_{j}|} $$<br /><br />
</li>
<li>
<b>Median (Gaussian)</b> formalized as:
$$ \hat{\sigma}_{\epsilon, j} = \frac{\med\rbrace{|W_{1, j}|, \ldots, |W_{m_{j}, j}|}}{0.6745} $$<br /><br />
</li>
</ul>
<h4 class="subseccol", id="sec4.4">Thresholding Implementation</h4>
The previous sections were devoted to describing thresholding rules and optimal threshold values. Here the focus is on summarizing thresholding implementations.<br /><br />
Effectively all wavelet thresholding procedures follow the algorithm below:
<ol>
<li>
Compute a wavelet transformation of the original data up to some scale $ J^{\star} < J $. In other words, derive a partial wavelet transform and derive the wavelet and scaling coefficients $ \mathbf{W}_{1}, \ldots, \mathbf{W}_{J^{\star}}, \mathbf{V}_{J^{\star}} $.<br /><br />
</li>
<li>
Select an optimal threshold $ \eta $ from one of the methods discussed earlier.<br /><br />
</li>
<li>
Threshold the coefficients at each scale $ \lambda_{j} $ for $ j \in \cbrace{1, \ldots, J^{\star}} $ using the threshold value selected in 2 and some thresholding rule (hard or soft). This will generate a set of modified (thresholded) wavelet coefficients $ \mathbf{W}^{\text{(T)}}_{1}, \ldots, \mathbf{W}^{\text{(T)}}_{J^{\star}} $. Observe that scaling coefficients $ \mathbf{V}_{J^{\star}} $ are <b>not</b> thresholded.<br /><br />
</li>
<li>
Use MRA with the thresholded coefficients to reconstruct the signal (original data) as follows:
\begin{align*}
\hat{\mathbf{y}} &= \xsum{j}{1}{J^{\star}}{\mathcal{W}^{\top}\mathbf{W}^{\text{(T)}}_{j}} + \mathcal{V}^{\top}\mathbf{V}_{J^{\star}}\\
&= \xsum{j}{1}{J^{\star}}{\mathcal{D}^{\text{(T)}}_{j}} + \mathcal{S}_{J^{\star}}
\end{align*}<br /><br />
</li>
</ol>
<h3 class="seccol", id="sec5">Conclusion</h3>
In this first entry of our series on wavelets, we provided a theoretical overview of the most important aspects in wavelet analysis. In <a href='http://blog.eviews.com/2020/12/wavelet-analysis-part-ii-applications.html'>Part II</a>, we will see how to apply these concepts by using the new wavelet features released with EViews 12.<br /><br /><br />
<hr />
<h3 class="seccol", id="sec6">References</h3>
<ol class="bib2xhtml">
<li id="abramovich-1995">
Abramovich F and Benjamini Y (1995), <i>"Thresholding of wavelet coefficients as multiple hypotheses testing procedure"</i>, In Wavelets and Statistics. , pp. 5-14. Springer.
</li>
<li id="daubechies-1992">
Daubechies I (1992), <i>"Ten lectures on wavelets, CBMS-NSF Conference Series in Applied Mathematics"</i>, SIAM Ed. , pp. 122-122.
</li>
<li id="donoho-1994">
Donoho DL and Johnstone IM (1994), <i>"Ideal spatial adaptation by wavelet shrinkage"</i>, biomeliika. Vol. 81(3), pp. 425-455. Oxford University Press.
</li>
<li id="donoho-1995">
Donoho DL and Johnstone IM (1995), <i>"Adapting to unknown smoothness via wavelet shrinkage"</i>, Journal of the american statistical association. Vol. 90(432), pp. 1200-1224. Taylor & Francis Group.
</li>
<li id="donoho-1998">
Donoho DL, Johnstone IM and others (1998), <i>"Minimax estimation via wavelet shrinkage"</i>, The annals of Statistics. Vol. 26(3), pp. 879-921. Institute of Mathematical Statistics.
</li>
<li id="gencay-2001">
Gençay R, Selçuk F and Whitcher BJ (2001), <i>"An inlioduction to wavelets and other filtering methods in finance and economics"</i> Academic press.
</li>
<li id="jansen-2010">
Jansen M (2010), <i>"Minimum risk methods in the estimation of unknown sparsity"</i>, Technical report.
</li>
<li id="mallat-1989">
Mallat S (1989), <i>"A theory for multiresolution signal decomposition: The wavelet representation"</i>, Pattern Analysis and Machine Intelligence, IEEE liansactions on. Vol. 11(7), pp. 674-693.
</li>
<li id="percival-2000">
Percival D and Walden A (2000), <i>"Wavelet methods for time series analysis"</i> Vol. 4 Cambridge Univ Pr.
</li>
<li id="priestley-1981">
Priestley MB (1981), <i>"Speclial analysis and time series: probability and mathematical statistics"</i> (04; QA280, P7.)
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-6385533183570778612020-07-16T09:48:00.003-07:002020-07-17T07:32:26.402-07:00Time Series Methods for Modelling the Spread of Epidemics<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: "{\\left(}",
rb: "{\\right)}",
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1]
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Eren Ocakverdi</i><br /><br />
This blog piece intends to introduce two new add-ins (i.e. <a href='http://www.eviews.com/Addins/seirmodel.aipz'>SEIRMODEL</a> and <a href='http://www.eviews.com/Addins/tsepigrowth.aipz'>TSEPIGROWTH</a>) to EViews users’ toolbox and help close the gap between epidemiological models and time series methods from a practitioner’s point of view.
<a name='more'></a><br /><br />
<h3>Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Susceptible-Exposed-Infected-Recovered (SEIR) model</a>
<li><a href="#sec3">Observational Models</a>
<li><a href="#sec4">Application to COVID-19 Data from Turkey</a>
<li><a href="#sec5">Files</a>
<li><a href="#sec6">References</a>
</ol><br />
<h3 id="sec1">Introduction</h3>
Spread of infectious diseases are usually described through compartmental models in mathematical epidemiology instead of observational time series models since analytical derivation of their dynamics are quite straightforward. These are merely structural models that divide the population into several states and then define the equations that govern the transition behavior from one state to another. In other words, <i>state space</i> models.<br /><br />
<h3 id="sec2">Susceptible-Exposed-Infected-Recovered (SEIR) model</h3>
I have written an add-in (<a href='http://www.eviews.com/Addins/seirmodel.aipz'>SEIRMODEL</a>) for interested EViews users, who would want to carry out their own analyses and gain basic insights into the systemic nature of an epidemic. The add-in implements a deterministic version of the SEIR model, which does not take into account vital dynamics like birth and death. Still, it offers a simplified framework for those who are not familiar with these concepts.<br /><br />
In order to run simulations, users need to provide required inputs (e.g. population size, calibration parameters, initial conditions etc.), details of which can be found in the documentation file that comes with the add-in:<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/seir_dialog.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/seir_dialog.png" title="SEIR Add-In Dialog"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: SEIR Add-In Dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
The default output is a chart showing the evolution of compartments/states during the spread of the epidemic. You can also save these series for further analysis.<br/><br/>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/seir_output.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/seir_output.png" title="SEIR Add-In: Output"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: SEIR Add-In Output</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<h3 id="sec3">Observational Models</h3>
Structural modelling of epidemics becomes increasingly complex when the heterogeneity in the population, mobility issues, interactions, etc. are considered in the computations. Functions fitted to observed data for calibration purposes are mostly nonlinear, which can further complicate the estimation process. Harvey and Kuttman (2020) recently proposed useful observational time series methods particularly for generalized logistic and Gompertz growth curves. I have written an add-in (<a href='http://www.eviews.com/Addins/tsepigrowth.aipz'>TSEPIGROWTH</a>) that implements those methods outlined in the paper.<br/><br/>
Suppose we wanted to fit these nonlinear curves to the number of infected individuals from the simulation of our earlier SEIR model:<br /><br />
<!-- :::::::::: FIGURES 3a and 3b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 3a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/seir_logistic.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/seir_logistic.png" title="SEIR: Generalized Logistic Fit"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 3b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/seir_gompertz.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/seir_gompertz.png" title="SEIR: Gompertz Growth Curve Fit"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3a: SEIR: Generalized Logistic Fit</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 3b: SEIR: Gompertz Growth Curve Fit</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 3a and 3b :::::::::: -->
Above, c(4) denotes the growth rate parameter. At this point I would also suggest EViews users to try the <a href="http://www.eviews.com/Addins/GBASS.aipz">GBASS</a> add-in, which incorporates the generalized BASS model developed for modelling how new products (or new viruses for that matter!) get adopted into a population.<br /><br />
If we wanted to take the other venue offered by Harvey and Kuttman (2020) and estimate these parameters via observational methods, then we could simply run the add-in:<br /><br />
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_dialog.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_dialog.png" title="TSEPIGROWTH Add-In: Dialog"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: TSEPIGROWTH Add-In Dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Output from the state space specification of these models are as follows:<br /><br />
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 3a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_logistic_ss.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_logistic_ss.png" title="TSEPIGROWTH: Generalized Logistic SS Model"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 3b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_gompertz_ss.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_gompertz_ss.png" title="TSEPIGROWTH: Gompertz Growth Curve SS Model"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5a: TSEPIGROWTH: Generalized Logistic SS Model</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 5b: TSEPIGROWTH: Gompertz Growth Curve SS Model</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 5a and 5b :::::::::: -->
Here, the final value of the state variable <i>CHANGE</i>, corresponds to the growth rate parameter and is more or less close to that of fitted nonlinear curves.<br/><br/>
<h3 id="sec4">Application to COVID-19 Data From Turkey</h3>
Examples above may be important or useful from a pedagogical point of view, but we need to try these models on actual data to gain more insight from a practical perspective. Naturally, COVID-19 data would be the most recent and most appropriate place to start. Users can visit the <a href='http://blog.eviews.com/2020/03/mapping-covid-19.html'>previous blog post</a> to learn how to fetch COVID-19 data from various sources. Here, I’ll use another data source provided by the WHO.<br /><br />
First, we fit a Gompertz curve to the level and make forecasts until the end of year. Next, we do the same exercise with the observational counterparts of the Gompertz model that focus on estimation of the growth rate.<br /><br />
The chart below visually compares the fitted values of growth:<br /><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/grfit.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/grfit.png" title="Gompertz Fit Curves"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: Gompertz Fit Curves</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
The next plot displays the forecasted values for the level:
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/grfcast.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/grfcast.png" title="Gompertz Forecast Curves"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7: Gompertz Forecast Curves</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
These forecasts indicate different saturation levels, of which the nonlinear curve is the lowest. This is mainly because the inflection point of the fitted nonlinear curve implies levelling off at an earlier date. The first observational model has a deterministic trend, but performs better since it focuses on the growth rate. There is an obvious change in trend at the beginning of June as Turkey then announced the first phase of COVID-19 restriction easing and marked the start of the normalization process. Observational models allow us to model this change explicitly as a slope intervention:
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/policyss.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/policyss.png" title="Policy Intervention SS Model"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: Policy Intervention SS Model</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
The coefficient <i>C(3)</i> verifies that the growth rate has risen significantly as of June. Dynamic versions of the observational model of Gompertz fits a flexible trend to data so it adapts to changes in growth rates without any need for explicit modelling of the intervention. It also allows the analysis of the impact of policy/intervention from a counterfactual perspective. The plot below compares the out-of-sample forecasts of the dynamic model before and after the normalization period. The shift in the forecasted level of total cases is obvious!
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/tsepigrowth/policygrfcast.png"><img height="auto"
src="http://www.eviews.com/blog/tsepigrowth/policygrfcast.png" title="Policy Intervention Out of Sample Forecast"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9: Policy Intervention Out of Sample Forecast</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
<h3 id="sec5">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_blog.prg">tsepigrowth_blog.prg</a>
</ul>
<br /><br />
<hr />
<h3 id="sec6">References</h3>
<ol class="bib2xhtml">
<li><a name="harvey-2020"></a>Harvey, A. C. and Kattuman, P.:
Time Series Models Based on Growth Curves with Applications to Forecasting Coronavirus
<cite>Covid Economics: Vetted and Real-Time Papers</cite>, 24(1) 126–157, 2020.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com3tag:blogger.com,1999:blog-6883247404678549489.post-33436183732368941762020-04-01T06:44:00.000-07:002020-04-01T06:44:12.331-07:00Mapping COVID-19: Follow-up<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: "{\\left(}",
rb: "{\\right)}",
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1]
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
As a follow up to our <a href="http://blog.eviews.com/2020/03/mapping-covid-19.html">previous blog entry</a> describing how to import Covid-19 data into EViews and produce some maps/graphs of the data, this post will produce a couple more graphs similar to ones we've seen become popular across social media in recent days.
<a name='more'></a><br /><br />
<h3>Table of Contents</h3>
<ol>
<li><a href="#sec1">Deaths Since First Death</a>
<li><a href="#sec2">One Week Difference</a>
</ol><br />
<h3 id="sec1">Deaths Since First Death</h3>
The first is a graph showing the 3 day moving average of the number of deaths per day since the first death was recorded in a country, for countries with a current number of deaths greater than 160:<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/3dma.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/3dma.png"
title="3-Day moving average" width="480" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: 3-Day moving average</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
The graph shows that for most countries the growth rate of deaths (approximated by using log-scaling) is increasing, but at a slower rate. The code to produce this graph, including importing the death data from Johns Hopkins is:<br /><br />
<pre style="overflow:auto">
<font color="green">'import the death data from Johns Hopkins</font>
%url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
<font color="green">'load up the url as a new page</font>
pageload(page=temp) {%url}
<font color="green">'stack the page into a 2d panel</font>
pagestack(page=stack) _? @ *? *
<font color="green">'do some renaming and make the date series</font>
rename country_region country
rename province_state province
rename _ deaths
series date = @dateval(var01, "MM_DD_YYYY")
<font color="green">'structure the page </font>
pagestruct province country @date(date)
<font color="green">'delete the original page</font>
pagedelete temp
<font color="green">'create the panel page</font>
pagecreate(id, page=panel) country @date @srcpage stack
<font color="green">'copy the deaths series to the panel page</font>
copy(c=sum) stack\deaths * @src @date country @dest @date country
pagedelete stack
<font color="green">'contract the page to only include countries with greater than 160 deaths</font>
pagecontract if @maxsby(deaths,country)>160
<font color="green">'create a series containing the number of days since the first death was recorded in each country. This series is equal to 0 if the number of deaths on a date is equal to the minimum number of deaths for that country (nearly always 0, but for China, the data starts after the first recorded death), and then counts up by one for dates after the minimum.</font>
series days = @recode(deaths=@minsby(deaths,country), 0, days(-1)+1)
<font color="green">'contract the page so that days before the second recorded death in each country are removed</font>
pagecontract if days>0
<font color="green">'restructure the page to be based on this day count rather than actual dates</font>
pagestruct(freq=u) @date(days) country
<font color="green">'set sample to be first 45 days</font>
smpl 1 45
<font color="green">'make a graph of the 3 day moving average of deaths</font>
freeze(d_graph) @movav(log(deaths),3).line(m, panel=c)
d_graph.addtext(t, just(c)) Deaths Since First Death\n(3 day moving average, log scale)
d_graph.addtext(br) Days
d_graph.addtext(l) log(deaths)
d_graph.legend columns(5)
d_graph.legend position(-0.6,3.72)
show d_graph
</pre>
<h3 id="sec2">One Week Difference</h3>
The second graph is an interesting approach plotting the one-week difference in the number of new confirmed cases of COVID-19 against the total number of confirmed cases for each country, with both shown using log-scales. We have only included countries with more than 140 deaths, and have highlighted just three countries – China, South Korea and the US.<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/weekdiff.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/weekdiff.png"
title="One week difference" width="480" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: One week difference</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
The code to generate this graph is:<br /><br />
<pre style="overflow:auto">
<font color="green">'names of the three topics/files</font>
%topics = "confirmed deaths recovered"
<font color="green">'loop through the topics</font>
for %topic {%topics}
<font color="green">'build the url by taking the base url and then adding the topic in the middle</font>
%url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_" + %topic + "_global.csv"
<font color="green">'load up the url as a new page</font>
pageload(page=temp) {%url}
<font color="green">'stack the page into a 2d panel</font>
pagestack(page=stack_{%topic}) _? @ *? *
<font color="green">'do some renaming and make the date series</font>
rename country_region country
rename province_state province
rename _ {%topic}
series date = @dateval(var01, "MM_DD_YYYY")
<font color="green">'structure the page</font>
pagestruct province country @date(date)
<font color="green">'delete the original page</font>
pagedelete temp
next
<font color="green">'create the panel page</font>
pagecreate(id, page=panel) country @date @srcpage stack_{%topic}
<font color="green">'loop through the topics copying each from the 2D panel</font>
for %topic {%topics}
copy(c=sum) stack_{%topic}\{%topic} * @src @date country @dest @date country
pagedelete stack_{%topic}
next
<font color="green">'contract the page to only include countries with more than 140 deaths</font>
pagecontract if @maxsby(deaths, country)>140
<font color="green">'make a group, called DATA, containing confirmed cases and the one week difference in confirmed cases</font>
group data confirmed confirmed-confirmed(-7)
<font color="green">'set the sample to remove periods with fewer than 50 cases</font>
smpl if confirmed > 50
<font color="green">'produce a panel plot of confirmed against 7 day difference in confirmed</font>
freeze(c_graph) data.xyline(panel=c)
<font color="green">' Add titles</font>
c_graph.addtext(t) "COVID-19: New vs. Total Cases\n(Countries with >140 deaths)"
c_graph.addtext(bc, just(c)) "Total Confirmed Cases\n(log scale)"
c_graph.addtext(l, just(c))"New Confirmed Cases (in the past week)\n(log scale)"
c_graph.setelem(1) legend("")
<font color="green">' Adjust axis to use logs</font>
c_graph.axis(b) log
c_graph.axis(l) log
<font color="green">' Adjust lines - remove lines after this if you want to show all countries</font>
c_graph.legend -display
for !i = 1 to @rows(@uniquevals(country))
c_graph.setelem(!i) linewidth(.75) linecolor(@rgb(192,192,192))
next
c_graph.setelem(8) linecolor(@rgb(128,64,0))
c_graph.setelem(3) linecolor(@rgb(0,64,128))
c_graph.setelem(15) linecolor(@rgb(0,128,0))
<font color="green">'add some text</font>
c_graph.addtext(3.29, 1.92, font(Calibri,10)) "S. Korea"
c_graph.addtext(4.87, 2.35, font(Calibri,10)) "China"
c_graph.addtext(5.31, 0.23, font(Calibri,10)) "United States"
show c_graph
</pre>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com8tag:blogger.com,1999:blog-6883247404678549489.post-11407133675307775742020-03-30T17:28:00.001-07:002020-04-01T07:55:06.046-07:00Mapping COVID-19<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: "{\\left(}",
rb: "{\\right)}",
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1]
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
With the world currently experiencing the Covid-19 crisis, many of our users are working remotely (aside: for details on how to use EViews at home, visit our <a href="http://www.eviews.com/covid">Covid licensing page</a>) anxious to follow data on how the virus is spreading across parts of the world. There are many sources of information on Covid-19, and we thought we’d demonstrate how to fetch some of these sources directly into EViews, and then display some graphics of the data. (Please visit our <a href="http://blog.eviews.com/2020/04/mapping-covid-19-follow-up.html">follow up post</a> for a few more graph examples).
<a name='more'></a><br /><br />
<h3>Table of Contents</h3>
<ol>
<li><a href="#sec1">Johns Hopkins Data</a>
<li><a href="#sec2">European Centre for Disease Prevention and Control Data</a>
<li><a href="#sec3">New York Times US County Data</a>
<li><a href="#sec4">Sneak Peaks</a>
</ol><br />
<h3 id="sec1">Johns Hopkins Data</h3>
To begin we'll retrieve data from the Covid-19 Time Series collection from <a href="https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series">Johns Hopkins Whiting School of Engineering Center for Systems Science and Engineering</a>. These data are organized into three csv files, one containing confirmed cases, on containing deaths, and one recoveries at both country and state/province levels. Each file is organized such that the first column contains state/province name (where applicable), the second column the country name, the third and fourth contain average latitude and longitude, and then the remaining columns containing daily values.<br /><br />
There are a number of different approaches that could be used to import these data into an EViews workfile. We’ll demonstrate an approach that will stack the data into a single panel workfile. We’ll start with importing the confirmed cases data. EViews is able to directly open CSV files over the web using the <b>File->Open->Foreign Data as Workfile</b> menu item:<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhopenpath.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhopenpath.png"
title="JH open path" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1: JH open path</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
Which results in the following workfile:<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhwf.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhwf.png"
title="JH workfile" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: JH workfile</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
Each day of data has been imported into its own series, with the name of the series being the date. There are also series containing the country/region name and the province/state name, as well as latitude and longitude.<br /><br />
To create a panel, we’ll want to stack these date series into a single series, which we can do simply with the <b>Proc->Reshape Current Page->Stack in New Page…</b><br /><br />
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhstackdialog.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhstackdialog.png"
title="JH stack data dialog" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3: JH stack data dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
Since all of the series we wish to stack have a similar naming structure – they all start with an “_” we can instruct EViews to stack using “_?” as the identifier, where ? is a wildcard. This results in the following stacked workfile page:<br /><br />
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhstackwf.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhstackwf.png"
title="JH stack data workfile" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: JH stack data workfile</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Which is close to what we want, we simply need to tidy up some of the variable names, and instruct EViews to structure the page as a true panel. The date information has been imported into the alpha series VAR01, which we can convert into a true date series with:<br /><br />
<pre style="overflow:auto">
series date = @dateval(var01, "MM_DD_YYYY")
</pre>
The actual cases data is stored in the series currently named "_", which we can rename to something more meaningful with:<br /><br />
<pre style="overflow:auto">
rename _ cases
</pre>
And then finally we can structure the page as a panel by clicking on <b>Proc->Structure/Resize</b> current page, selecting Dated Panel as the structure type and filling in the date and filling in the cross-section and date information:<br /><br />
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhstructuredialog.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhstructuredialog.png"
title="JH workfile restructure" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: JH workfile restructure</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
When asked if we wish to remove blank values, we select no. We now have a 2-dimensional panel, with two sets of cross-sectional identifiers – one for province/state and the other for country:<br /><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jh3dpanel.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jh3dpanel.png"
title="JH 2D Panel" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: JH 2D Panel</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
If we want to sum up the state level data to create a traditional panel with just country and time, we can do so by creating a new panel page based upon the indices of this page. Click on the <b>New Page</b> tab at the bottom of the workfile and select <b>Specify by Identifier Series</b>. In the resulting dialog we enter the country series as the cross-section identifier we wish to keep:<br /><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhpagebyid.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhpagebyid.png"
title="JH page by ID" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: JH page by ID</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
Which results in a panel. We can then copy the cases series from our 2D panel page to the new panel page with standard copy and paste, but ensuring to change the Contraction method to Sum in the Paste Special dialog:<br /><br />
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhpastedialog.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhpastedialog.png"
title="JH paste dialog" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7: JH paste dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhpanelwf.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhpanelwf.png"
title="JH panel workfile" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: JH panel workfile</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
With the data in a standard panel workfile, all of the standard EViews tools are now available. We can view a graph of the cases by country by opening the cases series, clicking on <b>View->Graph</b>, and then selecting <b>Individual cross sections</b> as the <b>Panel option</b>.<br /><br />
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhallcxgraph.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhallcxgraph.png"
title="JH graph of all cross-sections" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9: JH graph of all cross-sections</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
This graph may be a little unwieldy, so we can reduce the number of cross-sections down to, say, only countries that have, thus far, experienced more than 10,000 cases by using the smpl command:<br /><br />
<pre style="overflow:auto">
smpl if @maxsby(cases, country_region)>10000
</pre>
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/jhmaxsbygraph.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/jhmaxsbygraph.png"
title="JH cross-sections with more than 10000 cases" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9: JH cross-sections with more than 10000 cases</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
Of course, all of this could have been done in an EViews program, and it could be automated to combine all three data files, ending up with a panel containing cases, deaths and recoveries. The following EViews code produces such a panel:<br /><br />
<pre style="overflow:auto">
<font color="green">'close all existing workfiles</font>
close @wf
<font color="green">'names of the three topics/files</font>
%topics = "confirmed deaths recovered"
<font color="green">'loop through the topics</font>
for %topic {%topics}
<font color="green">'build the url by taking the base url and then adding the topic in the middle</font>
%url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_" + %topic + "_global.csv"
<font color="green">'load up the url as a new page</font>
pageload(page=temp) {%url}
<font color="green">'stack the page into a 3d panel</font>
pagestack(page=stack_{%topic}) _? @ *? *
<font color="green">'do some renaming and make the date series</font>
rename country_region country
rename province_state province
rename _ {%topic}
series date = @dateval(var01, "MM_DD_YYYY")
<font color="green">'structure the page</font>
pagestruct province country @date(date)
<font color="green">'delete the original page</font>
pagedelete temp
<font color="green">'create the 2D panel page</font>
pagecreate(id, page=panel) country @date @srcpage stack_{%topic}
next
<font color="green">'loop through the topics copying each from the 3D panel into the 2D panel</font>
for %topic {%topics}
copy(c=sum) stack_{%topic}\{%topic} * @src @date country @dest @date country
pagedelete stack_{%topic}
next
</pre>
<h3 id="sec2">European Centre for Disease Prevention and Control Data</h3>
The second repository we'll use is data provided by the <a href="https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide">ECDC's Covid-19 Data site</a>. They provide an extremely easy to use data for each country, along with population data. Importing these data into EViews is trivial – you can open the XLSX file directly using the <b>File->Open-Foreign Data as Workfile</b> dialog and entering the URL to the XLSX in the <b>File name</b> box:<br /><br />
<!-- :::::::::: FIGURE 10 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/ecdcopenpath.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/ecdcopenpath.png"
title="ECDC open path" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 10: ECDC open path</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 10 :::::::::: -->
The resulting workfile will look like this:<br /><br />
<!-- :::::::::: FIGURE 11 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/ecdcwf.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/ecdcwf.png"
title="ECDC workfile" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 11: ECDC workfile</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 11 :::::::::: -->
All we need to do is structure it as a panel, which we can do by clicking on <b>Proc->Structure/Resize Current Page</b> and then entering the cross-section and date identifiers (we also choose to keep an unbalanced panel by unchecking the <b>Balance between starts & ends</b> box).<br /><br />
<!-- :::::::::: FIGURE 12 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/ecdcstructuredialog.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/ecdcstructuredialog.png"
title="ECDC structure WF dialog" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 12: ECDC strcture WF dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 12 :::::::::: -->
The result is an EViews panel workfile:<br /><br />
<!-- :::::::::: FIGURE 13 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/ecdcseries.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/ecdcseries.png"
title="ECDC series" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 13: ECDC series</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 13 :::::::::: -->
The data provided by ECDC contains the number of new cases and deaths each day. Most presentation of Covid-19 data has been with the total number of cases and deaths per country. We can create the totals with the <b>@cumsum</b> function which will produce the cumulative sum, resetting to zero as the start of each cross-section.<br /><br />
<pre style="overflow:auto">
series ccases = @cumsum(cases)
series cdeaths = @cumsum(deaths)
</pre>
With this panel we can perform standard panel data analysis, or produce graphs (see the Johns Hopkins examples above). However, since the ECDC have included standard <a href="https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes"></a>ISO country codes for the countries, we can also tie the data to a geomap.<br /><br />
We found a simple <a href="http://thematicmapping.org/downloads/world_borders.php">shapefile</a> of the world <a href="http://thematicmapping.org/downloads/world_borders.php">online</a>, and downloaded it to our computer. In EViews we then click on <b>Object->New Object->GeoMap</b> to create a new geomap, and then drag the <b>.prj</b> file we downloaded onto the geomap.<br /><br />
In the properties box that appears, we tie the countries defined in the shapefile to the identifiers in the workfile. Since the shapefile uses ISO codes, and we have those in the <b>countriesandterritories</b> series, we can use those to map the workfile to the shapefile:<br /><br />
<!-- :::::::::: FIGURE 14 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/geomapprops.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/geomapprops.png"
title="Geomap properties" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 14: Geomap properties</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 14 :::::::::: -->
Which results in the following global geomap:<br /><br />
<!-- :::::::::: FIGURE 15 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/geopmapglobal.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/geomapglobal.png"
title="Global geomap" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 15: Global geomap</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 15 :::::::::: -->
We can use the <b>Label:</b> dropdown to remove the country labels to give a clearer view of the map (note this feature is a recent addition, you may need to update your copy of EViews to see the <b>None</b> option).<br /><br />
To add some color information to the map we click on <b>Properties</b> and then the <b>Color</b> tab. We'll add two custom color settings – a gradient fill so show differences in the number of cases, and a single solid color for countries with a large number of cases:<br /><br />
<!-- :::::::::: FIGURES 16a and 16b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 16a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/covid19/images/ecdcgeomaprange.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/ecdcgeomaprange.png"
title="ECDC geomap color range"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 16b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/covid19/images/ecdcgeomapthresh.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/ecdcgeomapthresh.png"
title="ECDC geomap color threshold"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3a: ECDC geomap color range</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 3b: ECDC geomap color threshold</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 16a and 16b :::::::::: -->
And then entering <b>ccases</b> as the coloring series. This results in a map:<br /><br />
<!-- :::::::::: FIGURE 17 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/ecdcgeomap.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/ecdcgeomap.png"
title="ECDC geomap" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 17: ECDC geomap</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 17 :::::::::: -->
Again, this could all be done programmatically with the following program (note the ranges for coloring will need to be changed as the virus becomes more wide spread):<br /><br />
<pre style="overflow:auto">
<font color="green">'download data</font>
wfopen https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx
rename countryterritorycode iso3
pagecontract if iso3<>""
pagestruct(bal=m) iso3 @date(daterep)
<font color="green">'make cumulative data</font>
series ccases = @cumsum(cases)
series cdeaths = @cumsum(deaths)
<font color="green">'make geomap for cases</font>
geomap cases_map
cases_map.load ".\World Map\TM_WORLD_BORDERS_SIMPL-0.3.prj"
cases_map.link iso3 iso3
cases_map.options -legend
cases_map.setlabel none
cases_map.setfillcolor(t=custom) mapser(ccases) naclr(@RGB(255,255,255)) range(lim(0,12000,cboth), rangeclr(@grad(@RGB(255,255,255),@RGB(0,0,255))), outclr(@trans,@trans), name("Range")) thresh(12000, below(@trans), above(@RGB(0,0,255)), name("Threshold"))
<font color="green">'make geomaps for deaths</font>
geomap deaths_map
deaths_map.load ".\World Map\TM_WORLD_BORDERS_SIMPL-0.3.prj"
deaths_map.link iso3 iso3
deaths_map.options -legend
deaths_map.setlabel none
deaths_map.setfillcolor(t=custom) mapser(cdeaths) naclr(@RGB(255,255,255)) range(lim(1,500,cboth), rangeclr(@grad(@RGB(255,128,128),@RGB(128,64,64))), outclr(@trans,@trans), name("Range")) thresh(500,cleft,below(@trans),above(@RGB(128,0,0)),name("Threshold"))
</pre>
<h3 id="sec3">New York Times US County Data</h3>
The final data repository we will look at is the <a href="https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv">New York Times</a> data for the United States at county level. These data are also trivial to import into EViews, you can again just enter the URL for the CSV file to open it. Rather than walking through the UI steps, we'll simply post the two lines of code required to import and structure as a panel:<br /><br />
<pre style="overflow:auto">
<font color="green">'retrieve data from NY Times github</font>
wfopen(page=covid) https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
<font color="green">'structure as a panel based on date and FIPS ID</font>
pagestruct(dropna) fips @date(date)
</pre>
Note that the New York Times have conveniently provided the <a href="https://en.wikipedia.org/wiki/FIPS_county_code">FIPS code</a> for each county, which means we can also produce some geomaps. We've downloaded a US county map from the <a href="https://dataverse.tdl.org/dataset.xhtml?persistentId=doi:10.18738/T8/CPTP8C">Texas Data Repository</a>, and then linked the <b>FIPS</b> series in the workfile with the <b>FIPS_BEA</b> attribute of the map:<br /><br />
<!-- :::::::::: FIGURE 17 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/images/geomapfipsprops.png"><img height="auto"
src="http://www.eviews.com/blog/covid19/images/geomapfipsprops.png"
title="Geomap FIPS properties" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 17: Geomap FIPS properties</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 17 :::::::::: -->
The full code to produce such a map is:<br /><br />
<pre style="overflow:auto">
<font color="green">'retrieve data from NY Times github</font>
wfopen(page=covid) https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
<font color="green">'structure as a panel based on date and FIPS ID</font>
pagestruct(dropna) fips @date(date)
<font color="green">'set displaynames for use in geomaps</font>
cases.displayname Confirmed Cases
deaths.displayname Deaths
<font color="green">'make geomap</font>
geomap cases_map
cases_map.load ".\Us County Map\CountiesBEA.prj"
cases_map.link fips_bea fips
cases_map.options -legend
cases_map.setlabel none
cases_map.setfillcolor(t=custom) mapser(cases) naclr(@RGB(255,255,255)) range(lim(1,200,cboth), rangeclr(@grad(@RGB(204,204,255),@RGB(0,0,255))), outclr(@trans,@trans), name("Range")) thresh(200, below(@trans), above(@RGB(0,0,255)), name("Threshold"))
</pre>
<h3 id="sec4">Sneak Peaks</h3>
One of the features our engineering team have been working on for the next major release of EViews is the ability to produce animated graphs and geomaps (the keen eyed amongst you may have noticed the <b>Animate</b> button on a few of our screenshots). Whilst this feature is a little far away from release, the Covid-19 data does give an interesting set of testing procedures, and we thought we'd share some of the results.<br /><br />
<!-- :::::::::: ANIMATION 1 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/animations/cases_map.gif"><img height="auto"
src="http://www.eviews.com/blog/covid19/animations/cases_map.gif"
title="US counties cases evolution (wait for it...)" width="680" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Animation 1: US counties cases evolution</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: ANIMATION 1 :::::::::: -->
<!-- :::::::::: ANIMATION 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/covid19/animations/cases_map.gif">
<video width="680" controls>
<source src= "http://www.eviews.com/blog/covid19/animations/graph01.mp4" type="video/mp4"
title="Confirmed cases">
</video> </a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Animation 2: Confirmed cases</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: ANIMATION 1 :::::::::: -->
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com9tag:blogger.com,1999:blog-6883247404678549489.post-86065470278518103992020-02-25T07:58:00.001-08:002020-03-04T09:27:00.165-08:00Beveridge-Nelson Filter<style>
table {
border: 0px solid black;
border-collapse: separate;
border-spacing: 10px;
}
td {
border: 1px solid black;
}
.nb {
border: 0px solid black;
}
.step {
counter-reset: section;
list-style-type: none;
}
.step li::before {
counter-increment: section;
content: "Step "counter(section) ": ";
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: "{\\left(}",
rb: "{\\right)}",
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1]
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML"
type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Benjamin Wong (Monash University) and Davaajargal Luvsannyam (The Bank of Mongolia)</i><br /><br />
Analysis of macroeconomic time series often involves decomposing a series into a trend and cycle components. In this
blog post, we describe the Kamber, Morley, and Wong (2018) Beveridge-Nelson (BN) filter and the associated EViews
add-in.
<a name='more'></a><br /><br />
<h3>Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">The BN Decomposition</a>
<li><a href="#sec3">The BN Filter</a>
<li><a href="#sec4">Why Use the BN Filter</a>
<li><a href="#sec5">BN Filter Implementation</a>
<li><a href="#sec6">Conclusion</a>
<li><a href="#sec7">Files</a>
<li><a href="#sec8">References</a>
</ol><br />
<h3 id="sec1">Introduction</h3>
In this blog entry, we will discuss the Beveridge-Nelson (BN) filter - the Kamber, Morley, and Wong (2018)
modification of the well-known Beveridge and Nelson (1981) decomposition. In particular, we will discuss
the application of both procedures to estimating the <i>output gap</i>, which the US Bureau of Economic
Analysis (BEA) and the Congressional Budget Office (CBO) define as the proportional deviation of the real
actual <i>gross domestic product</i> (GDP) from the real potential GDP.<br /><br />
The analysis to follow will use quarterly data from the post World War II period 1947Q1 to 2019Q3 and will be
downloaded from the FRED database. In this regard, we begin by creating a new quarterly workfile as follows:
<ol>
<li>From the main EViews window, click on <b>File/New/Workfile...</b>.
<li>Under <b>Frequency</b> select <b>Quarterly</b>.
<li>Set the <b>Start date</b> to <i>1947Q1</i> and the set the <b>End date</b> to <i>2019Q3</i>
<li>Hit <b>OK</b>.
</ol>
Next, we fetch the GDP data as follows:
<ol>
<li>From the main EViews window, click on <b>File/Open/Database...</b>.
<li>From the <b>Database/File Type</b> dropdown, select <b>FRED Database</b>.
<li>Hit <b>OK</b>.
<li>From the FRED database window, click on the <b>Browse</b> button.
<li>Next, click on <b>All Series Search</b> and in the <b>Search for</b> box,type <i>GDPC1</i>. (This is the
real actual seasonally adjusted GDP)
<li>Drag the series over to the workfile to make it available for analysis.
<li>Again, in the <b>Search for</b> box, type <i>GDPPOT</i>. (This is the real potential seasonally unadjusted
GDP estimated by the CBO)
<li>Drag the series over to the workfile to make it available for analysis.
<li>Close the FRED windows as they are no longer needed.
</ol>
<!-- :::::::::: FIGURES 1a and 1b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 1a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/bnfilter/fredbrowse.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/fredbrowse.png" title="FRED Browse"
width="180" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 1b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/bnfilter/fredsearch.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/fredsearch.png" title="FRED Search"
width="180" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 1a: FRED Browse </small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 1b: FRED Search</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 2a and 2b :::::::::: -->
Next, rename the series <b>GDPC1</b> to <b>GDP</b> by issuing the following command:
<pre>
rename gdpc1 gdp
</pre>
We now show how to obtain the implied estimate of the output gap from the CBO to provide the user
some perspective on how to obtain the output gap. In particular, the CBO implied estimate of the
output gap is defined using the formula:
$$ CBOGAP = 100\left(\frac{GDP - GDPPOT}{GDPPOT}\right) $$
For reference, we will create this series in EViews and call it <b>CBOGAP</b>. This is done by
issuing the following command:
<pre>
series cbogap = 100*(gdp-gdppot)/gdppot
</pre>
We also plot <b>CBOGAP</b> below:
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/gap.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/gap.png" title=" CBO implied estimate of the output gap"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 2: CBO implied estimate of the output gap</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<h3 id="sec2">BN Decomposition</h3>
Recall here that for any time series $ y_{t} $, the BN decomposition determines a trend process $ \tau_{t} $
and a cycle process $ c_{t} $, such that $ y_{t} = \tau_{t} + c_{t} $. In this regard, the trend component
$ \tau_{t} $ is the deviation of the long-horizon conditional forecast of $ y_{t} $ from its deterministic drift
$ \mu $. In other words:
$$ \tau_{t} = \lim_{h\rightarrow \infty} E_{t}\left(y_{t+h} - h\mu\right) \quad \text{where} \quad \mu = E(\Delta
y_{t}) $$
On the other hand, the cyclical component is the deviation of the underlying process from its long-horizon forecast.
Intuitively, when $ y_{t} $ represents the GDP of some economy, the cycle process $ c_{t} = y_{t} - \tau_{t}$ is
interpreted as the <i>output gap</i>.<br /><br />
In practice, in order to capture the autocovariance structure of $ \Delta y_{t} $, the BN decomposition starts by
first fitting an autoregressive moving-average (ARMA) model to $ \Delta(y) $ and then proceeds to derive $ \tau_{t}
$ and $ c_{t} $. For instance, when the model of choice is AR(1), the BN decomposition derives from the following
steps:<br /><br />
<ol class="step">
<li>Fit an AR(1) model to $ \Delta y_{t} $:
$$ \Delta y_{t} = \widehat{\alpha} + \widehat{\phi}\Delta y_{t-1} + \widehat{\epsilon}_{t} $$
<li> Estimate the deterministic drift as the unconditional mean process:
$$ \widehat{\mu} = \frac{\widehat{\alpha}}{1 - \widehat{\phi}} $$
<li> Estimate the BN trend process:
$$ \widehat{\tau}_{t} = \left(y_{t} + \left(\frac{\widehat{\phi}}{1 - \widehat{\phi}}\right) \Delta
y_{t}\right) - \left(\frac{\widehat{\phi}}{1 - \widehat{\phi}}\right) \widehat{\mu}$$
<li> Estimate the BN cycle component:
$$ \widehat{c}_{t} = y_{t} - \widehat{\tau}_{t} $$
</ol><br />
As an illustrative example, consider the BN decomposition of US quarterly real GDP. To conform with the
Kamber, Morley, and Wong (2018) paper, we will also transform the raw US real GDP as 100 times its logarithm.
In this regard, we generate a new EViews series object <b>LOGGDP</b> by issuing the following command:
<pre>
series loggdp = 100 * log(gdp)
</pre>
At last, following the 4 steps outlined earlier, we derive the BN decomposition in EViews as follows:
<pre>
series dy = d(loggdp)
equation ar1.ls dy c dy(-1) 'Step 1
scalar mu = c(1)/(1-c(2)) 'Step 2
series bntrend = loggdp + (dy - mu)*c(2)/(1 - c(2)) 'Step 3
series bncycle = loggdp - bntrend 'Step 4
</pre>
The BN trend and cycle series are displayed in Figure 2 below.<br /><br />
<!-- :::::::::: FIGURES 3a and 3b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 3a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/bnfilter/bntrend.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bntrend.png" title="BN Trend"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 3b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/bnfilter/bncycle.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bncycle.png" title="BN Cycle"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 3a: BN Trend</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 3b: BN Cycle</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 3a and 3b :::::::::: -->
To see how the BN decomposition estimate of the output gap compares to the CBO implied estimate of the output gap,
we plot both series on the same graph.<br /><br />
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/bncvsgap.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bncvsgap.png"
title="BN Cycle vs CBO implied output gap estimate" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 4: BN Cycle vs CBO implied output gap estimate</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Evidently, the BN cycle series lacks persistence (very noisy), lacks amplitude (low variance), and in general,
does not exhibit the characteristics found in the CBO implied estimate of the output gap, <b>CBOGAP</b>.<br /><br />
<h3 id="sec3">The BN Filter</h3>
First, to explain why the BN estimate of output gap lacks the persistence of its true counterpart, recall the formula
for the BN cycle component for an AR(1) model:
$$ c_{t} = y_{t} - \tau_{t} = -\frac{\phi}{1-\phi}(\Delta y_{t} - \mu)$$
Clearly, when $ \phi $ is small, $ \Delta y_{t} $ is not very persistent. Since $ c_{t} $ is only as persistent as
$ \Delta y_{t} $, the cycle component itself lacks the persistence one expects of the true output gap series.<br /><br />
Next, to explain why $ c_{t} $ lacks the expected amplitude, define the signal-to-noise ratio $ \delta $ for any time
series as the ratio of the variance of trend shocks relative to the overall forecast error variance. In other words:
$$ \delta \equiv \frac{\sigma^{2}_{\Delta \tau}}{\sigma^{2}_{\epsilon}} = \psi(1)^{2} $$
which follows since $ \Delta\tau = \psi(1)\epsilon_{t} $ and $ \psi(1) = \lim_{h\rightarrow \infty} \frac{\partial
y_{t+h}}{\partial \epsilon_{t}} $. Intuitively, $ \psi(1) $ is the <i>long-run multiplier</i> that captures the
permanent effect of the forecast error on the long-horizon conditional expectation of $ y_{t} $. Quite generally, as
demonstrated in Kamber, Morley, and Wong (2018), for any AR(p) model:
\begin{align}
\Delta y_{t} = c + \sum_{k=1}^{p}\phi_{k}\Delta y_{t-k} + \epsilon_{t} \label{eq1}
\end{align}
the signal-to-noise ratio is given by the relation
\begin{align}
\delta = \frac{1}{(1-\phi(1))^{2}} \quad \text{where} \quad \phi(1) = \phi_{1} + \ldots + \phi_{p}\label{eq2}
\end{align}
In particular, when the forecasting model is AR(1), as was the case in the BN decomposition above, the signal-to-noise
ratio is simply $ \delta = \frac{1}{(1-\phi)^{2}} $ and in the case of the US GDP growth process, it is
$ \delta = \frac{1}{(1-0.36)^{2}} = 2.44$. In other words, the BN trend shocks exhibit higher volatility than
quarter-to-quarter forecast errors and the signal-to-noise ratio is therefore relatively high. In fact, in the case
of a freely estimated AR$ (p) $ model of output growth, $ \phi(1) < 1 $, which implies that $ \delta > 1 $. In other words,the trend will be more volatile than the cycle, and at odds if one expects the cycle shocks (the output gap amplitude) to explain the majority of the systematic forecast variance.<br /><br />
To correct for the aforementioned shortcomings of the BN decomposition, Kamber, Morley, and Wong (2018) exploit
the relationship between the signal-to-noise ratio and the AR coefficients in equation \eqref{eq2}. In particular,
they note that equation \eqref{eq2} implies that:
\begin{align}
\phi(1) = 1 - \frac{1}{\sqrt{\delta}}
\end{align}
In this regard, the idea underlying the BN filter is to fix a specific value to the signal-to-noise ratio, say
$ \delta = \bar{\delta} $. Subsequently, the BN decomposition is derived from an AR model, the AR coefficients
of which are forced to sum to $ \bar{\phi}(1) \equiv 1 - \frac{1}{\sqrt{\bar{\delta}}} $. In other words, the
BN decomposition is derived while imposing a particular signal-to-noise ratio.<br /><br />
It is important to note here that estimation of the BN decomposition under a particular signal-to-noise ratio
restriction is in fact straightforward and does not require complicated non-linear routines. To see this, observe
that equation \eqref{eq1} can be rewritten as:
\begin{align}
\Delta y_{t} = c + \rho \Delta y_{t-1} + \sum_{k=1}^{p-1}\phi^{\star}_{k}\Delta^{2} y_{t-k} + \epsilon_{t} \label{eq3}
\end{align}
where $ \rho = \phi_{1} + \ldots + \phi_{p} $ and $ \phi^{\star}_{k} = -\left(\phi_{k+1} + \ldots +
\phi_{p}\right) $. Then, imposing the restriction $ \rho = \bar{\rho} \equiv \bar{\phi}(1) $ reduces the
regresion in \eqref{eq3} to:
\begin{align}
\Delta y_{t} - \bar{\rho} \Delta y_{t-1} = c + \sum_{k=1}^{p-1}\phi^{\star}_{k}\Delta^{2} y_{t-k} + \epsilon_{t} \label{eq4}
\end{align}
In other words, $ \bar{\rho}\Delta y_{t-1} $ is brought to the left hand side and the regressand in the regression
\eqref{eq4} becomes $ \Delta \bar{y}_{t} \equiv \Delta y_{t} - \bar{\rho} \Delta y_{t-1} $.<br /><br />
<h3 id="sec4">Why Use the BN Filter?</h3>
Before we demonstrate the BN Filter add-in, we quickly outline two reasons why the BN filter might be a reasonable
approach, particularly when estimating the output gap.
<ol>
<li>When analyzing GDP growth, standard ARMA model selection often favours low order AR variants, which, as
discussed earlier, produce high signal-to-noise ratios.
<li>Unlike alternative low signal-to-noise ratio procedures such as deterministic quadratic detrending, the
Hodrick-Prescott (HP) filter, and the bandpass (BP) filter, which often require large number of estimation
revisions (as new data comes in) and are typically unreliable in out-of-sample forecasts (see Orphanides
and Van Norden, (2003)), Kamber, Morley and Wong (2018) argue that the BN filter exhibits better out-of-sample
performance and generally requires fewer estimation revisions to match observable data characteristics.<br /><br />
</ol>
To further drive this latter point, we demonstrate the impact of ex-post estimation of the output gap using the HP
filter. In particular, we will first estimate the output gap (the cycle component) of the <b>LOGGDP</b> series for the
period 1947Q1 to 2008Q3 and call it <b>HPCYCLE</b>, and then again for the period 1947Q1 to 2019Q3 and call it
<b>HPCYLCE_EXPOST</b>.<br /><br />
To estimate the HP filter cycle component for the period 1947Q1 to 2008Q3, we first set the sample accordingly by
issuing the command:
<pre>
smpl @first 2008Q3
</pre>
Next, we estimate the HP filter cycle series as follows:
<ol>
<li>From the workfile, double click on the series <b>LOGGDP</b> to open the series.
<li>In the series window, click on <b>Proc/Hodrick-Prescott Filter...</b>
<li>In the <b>Cylce series</b> text box, type <i>hpcycle</i>.
<li>Hit <b>OK</b>.
</ol>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/hpfilter.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/hpfilter.png" title="HP Filter"
width="180" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: HP Filter</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
The steps are repeated for the sample period 1947Q1 to 2019Q3. A plot of both cycle series on the same graph is
presented below.<br /><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/hpcycleexpost.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/hpcycleexpost.png"
title="HP Cycle vs HP Cycle Ex Post" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: HP Cycle vs HP Cycle Ex Post</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
Evidently, the ex-post HP filter estimation of the output gap diverges from its shorter period counterpart
starting from 2006Q1. It is precisely this drawback that we will see is not nearly as pronounced in BN filter
estimates.<br /><br />
<h3 id="sec5">BN Filter Implementation</h3>
To implement the BN Filter, we need to download and install the add-in from the EViews website. The latter
can be found at <a href="https://www.eviews.com/Addins/BNFilter.aipz">https://www.eviews.com/Addins/BNFilter.aipz</a>.
We can also do this from inside EViews itself:
<ol>
<li>From the main EViews window, click on <b>Add-ins/Download Add-ins...</b>
<li>Click on the the BNFilter add-in.
<li>Click on <b>Install</b>.
</ol>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/addin.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/addin.png" title="Install Add-in"
width="180" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 5: Install Add-in</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
At last, we will demonstrate how to apply the BN Filter add-in using an AR(12) model. To do so, proceed as
follows:
<ol>
<li>From the workfile window, double click on <b>LOGGDP</b> to open the spreadsheet view of the series.
<li>To access the BN filter dialog, click on <b>Proc/Add-ins/BN Filter</b>
<li>Stick with the defaults and hit <b>OK</b>.
</ol><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfilter.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfilter.png" title="BN Filter Dialog"
width="180" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 6: BN Filter Dialog</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
The signal-to-noise ratio, while not specified above, is chosen using the Kamber, Morley, and Wong (2018) automatic
selection procedure which balances the trade off between fit and amplitude. Typically, the signal-to-noise ratio for
the US using such a procedure is about 0.25, which implies a quarter of the shocks to US GDP are permanent. Below, we
show the BN Filter cycle series both alone and in comparison to the CBO implied estimate of the output gap <b>CBOGAP</b>.<br /><br />
<!-- :::::::::: FIGURES 7a and 7b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 7a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfcycle.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfcycle.png" title="BN Filter Cycle"
width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 7b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfcvsgap.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfcvsgap.png"
title="BN Filter Cycle vs. CBO implied output gap estimate" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 7a: BN Filter Cycle</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 7b: BN Filter Cycle vs CBO implied output gap estimate</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 7a and 7b :::::::::: -->
We also plot a comparison of the BN Filter cycle series with the HP filtered cycle.<br /><br />
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfcvshpc.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfcvshpc.png" title="BN Filter Cycle vs HP Filter Cycle"
width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 8: BN Filter Cycle vs HP Filter Cycle</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
As we can see, the BN filter estimate of the the US output gap using an AR(12) model resembles what we would
get for an output gap that has a low signal-to-noise ratio. The amplitude is reasonably large, we see business
cycles, and the troughs line up with the recessions dated by the NBER. The amplitude of the output gap estimated using the BN Filter is comparable to that of the cycle obtained by the HP filter, as well as the implied estimated of the CBO, which is unlike what we see in Figure 4.<br /><br />
The BN filter add-in also accommodates the ability to incorporate knowledge about structural breaks. In
particular, we will use 2006Q1 as a structural break which is consistent with the date found by a Bai and Perron
(2003) test, used by Kamber, Morley and Wong (2018), and is consistent with independent work by Eo and Morley
(2019). The following steps demonstrate the outcome:
<ol>
<li>From the workfile window, double click on <b>LOGGDP</b> to open the spreadsheet view of the series.
<li>To access the BN filter dialog, click on <b>Proc/Add-ins/BN Filter</b>
<li>Select the <b>Structural Break</b> box.
<li>In the <b>Date of structural break</b> text box, enter <i>2006Q1</i>.
<li>Hit <b>OK</b>.
</ol><br />
<!-- :::::::::: FIGURE 9 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfcyclesb.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfcyclesb.png"
title="BN Filter Cycle (Structural Break)" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 9: BN Filter Cycle (Structural Break)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 9 :::::::::: -->
Now we see a more positive output gap post-2006 as the structural break accounts for the fact that the average
GDP growth rate has fallen.<br /><br />
Suppose however that we were ignorant about the actual date of the break. This might be the case in practice as
it could take a decade or more before one could empirically identify a structural break date. In this case, a
possible option is to use a rolling window for the average growth rate. In this example, we use a backward
window of 40 quarters as the average growth rate. The idea is that if there were breaks, they would be reflected
in this window. When this is the case, we proceed as follows:
<ol>
<li>From the workfile window, double click on <b>LOGGDP</b> to open the spreadsheet view of the series.
<li>To access the BN filter dialog, click on <b>Proc/Add-ins/BN Filter</b>
<li>Select the <b>Dynamic mean adjustment</b> box.
<li>Hit <b>OK</b>.
</ol><br />
<!-- :::::::::: FIGURES 10a and 10b :::::::::: -->
<center>
<table>
<tr>
<td>
<!-- :::::::::: FIGURE 10a :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfcycledma.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfcycledma.png"
title="BN Filter Cycle (Dynamic Mean Adjustment)" width="360" /></a><br />
</center>
</td>
<td>
<!-- :::::::::: FIGURE 9b :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfcsbvsedma.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfcsbvsedma.png"
title="BN Filter Cycle (Known vs Unknown Structural Break)" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 10a: BN Filter Cycle (Dynamic Mean Adjustment)</small>
</center>
</td>
<td class="nb">
<center>
<small>Figure 10b: BN Filter Cycle (Known vs Unknown Structural Break)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURES 10a and 10b :::::::::: -->
Evidently, the estimated output gap looks similar to the one estimated with an explicit structural break in
2006Q1. In general, this suggests that using a backward window to adjust for the mean growth rate might be a
useful real-time strategy for dealing with breaks.<br /><br />
Users are not constrained to the automatic option, which balances the trade off between fit and amplitude. The BN filter add-in also allows users to specify a desired signal-to-noise ratio. For instance, the following example compares the difference in setting the signal-to-noise ratio $ \delta $, to 0.05 (which implies 5% of the variance is permanent), against the default 0.25 which we derived earlier by leaving $ \delta $ unspecified, and so uses the procedure which balances the trade off between fit and amplitude. To do so, we proceed as follows:
<ol>
<li>From the workfile window, double click on <b>LOGGDP</b> to open the spreadsheet view of the series.
<li>To access the BN filter dialog, click on <b>Proc/Add-ins/BN Filter</b>
<li>Select the <b>Structural Break</b> box.
<li>In the <b>Date of structural break</b> text box, enter <i>2006Q1</i>.
<li>Hit <b>OK</b>.
</ol><br />
The plot below summarizes the exercise.<br /><br />
<!-- :::::::::: FIGURE 11 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfc25vs5.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfc25vs5.png"
title="BN Filter Cycle (delta = 0.25 vs delta = 0.05)" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 11: BN Filter Cycle (delta = 0.25 vs delta = 0.05</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 11 :::::::::: -->
Unsurprisingly, specifying $ \delta = 0.05 $ results in an output gap with a larger amplitude than the default as the new specification implies a smaller proportion of the shocks to the forecast error is parsed to the trend, and so a larger proportion is parsed to the cycle, leading to a larger amplitude cycle.<br /><br />
Finally, we come back to the issue of revision. As we mentioned earlier, the BN filter should produce output
gaps that are less revised as long as the AR forecasting model is stable, especially when compared to the
heavily revised HP Filter. Here, we show the output gap estimated using the BN filter with data up to
2008Q3, and one ex-post up to 2019Q3. Clearly, the output gap is hardly revised, which address a key critique
of Orphanides and Van Norden (2003).<br /><br />
<!-- :::::::::: FIGURE 11 :::::::::: -->
<center>
<table>
<tr>
<td>
<center>
<a href="http://www.eviews.com/blog/bnfilter/bnfcexpost.png"><img height="auto"
src="http://www.eviews.com/blog/bnfilter/bnfcexpost.png"
title="BN Filter Cycle (Ex-Post)" width="360" /></a><br />
</center>
</td>
</tr>
<tr>
<td class="nb">
<center>
<small>Figure 11: BN Filter Cycle (Ex-Post)</small>
</center>
</td>
</tr>
</table>
<br />
</center>
<!-- :::::::::: FIGURE 11 :::::::::: -->
<h3 id="sec6">Conclusion</h3>
In this blog post we have outlined the BN filter add-in associated with the work of Kamber, Morley and Wong (2018).
In general, we hope the ease of using the add-in, together with some of the useful properties of the BN Filter will
encourage practitioners to explore using the procedure in their work.<br /><br />
<h3 id="sec7">Files</h3>
<ul>
<li><a href="http://www.eviews.com/blog/bnfilter/bnfilter_blog.prg">bnfilter_blog.prg</a>
</ul>
<br /><br />
<hr />
<h3 id="sec8">References</h3>
<ol class="bib2xhtml">
<li><a name="bai-2003"></a>Bai J. and Perron P.:
Computation and analysis of multiple structural change models
<cite>Journal of Applied Econometrics</cite>, 18(1) 1–22, 2003.
</li>
<li><a name="beveridge-1981"></a>Beveridge S. and Nelson C. R.:
A new approach to decomposition of economic time series into permanent and transitory components with
particular attention to measurement of the business cycle
<cite>Journal of Monetary Economics</cite>, 7(2) 151–174, 1981.
</li>
<li><a name="eo-2019"></a>Eo Y. and Morley J.:
Why has the US economy stagnated since the Great Recession
<cite>University of Sydeny Working Papers 2017-14</cite>, 2019.
</li>
<li><a name="kamber-2018"></a>Kamber G., Morley J., and Wong B.:
Intuitive and reliable estimates of the output gap from a Beveridge-Nelson filter
<cite>The Review of Economics and Statistics</cite>, 100(3) 550–566, 2018.
</li>
<li><a name="orphanides-2002"></a>Orphanides A and Van Norden S.:
The unreliability of output-gap estimates in real time
<cite>The Review of Economics and Statistics</cite>, 84(4) 569–583, 2002.
</li>
<li><a name="watson-1986"></a>Watson M.:
Univariate detrending methods with stochastic trends
<cite>Journal of Monetary Economics</cite>, 18(1) 49–75, 1986.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com23tag:blogger.com,1999:blog-6883247404678549489.post-68080614447683089002019-12-04T09:39:00.000-08:002019-12-04T09:39:38.953-08:00Sign and Zero Restricted VAR Add-In<style>
table,
th,
td {
border: 1px solid black;
border-collapse: collapse;
}
th {
padding: 5px;
text-align: middle;
}
td {
padding: 5px;
text-align: left;
}
</style>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: "{\\left(}",
rb: "{\\right)}",
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1]
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Authors and guest post by Davaajargal Luvsannyam and Ulziikhutag Munkhtsetseg</i><br /><br />
In our previous <a href="http://blog.eviews.com/2019/10/sign-restricted-var-add-in.html">blog entry</a>, we discussed the sign restricted VAR (SRVAR) add-in for EViews. Here, we will discuss imposing a further zero restrictions on the impact period of the impulse response function (IRF) using the ARW and SRVAR add-ins in tandem.<a name='more'></a><br /><br />
<h3>Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">Orthogonal Reduced-Form Parameterization</a>
<li><a href="#sec3">ARW Algorithms</a>
<li><a href="#sec4">ARW EViews Add-in</a>
<li><a href="#sec5">Conclusion</a>
<li><a href="#sec6">References</a>
</ol><br />
<h3 id="sec1">Introduction</h3>
Note that it is certainly possible to impose both sign and exclusion restrictions. For example, Mountford and Uhlig (2009) are motivated by the idea that fiscal policy shocks are identified as orthogonal to both monetary policy and business cycle shocks, and use a penalty function approach (PFA) to impose zero restrictions. (For details on the PFA, please see our <a href="http://blog.eviews.com/2019/10/sign-restricted-var-add-in.html">SRVAR blog entry</a>.) They also considered anticipated government revenue shocks in which government revenue is restricted to rise one year following some impulse. Furthermore, Beaudry, Nam, and Wang (2011) estimate a structural VAR model including total factor productivity, stock prices, real consumption, real federal funds rate and hours worked. They use the PFA to show that a positive optimism shock causes an increase in both consumption and hours worked. Recently, Arias, Rubio-Ramirez, and Waggoner (2018), henceforth ARW, developed algorithms to independently draw from a family of conjugate posterior distributions over the structural parameterization when sign and zero restrictions are used to identify SRVARs. They showed the dangers of using the PFA when implementing sign and zero restrictions together to identify structural VARs (SVARs).<br /><br />
<h3 id="sec2">Orthogonal Reduced-Form Parameterization</h3>
ARW focus on two SVAR parameterizations. In addition to the classical structural parameterization, they show that SVARs can also be written as a product of a reduced-form parameters and a set of orthogonal matrices. This is called the <i>orthogonal reduced-form parameterization</i>, henceforth, ORF. The algorithms ARW propose draw from a conjugate posterior distribution over the ORF and then transform said draws into a structural parameterization. In particular, they use the normal-inverse-Wishart distribution as the prior conjugate distribution, and develop a change of variable theory that characterizes the induced family of densities over the structural parameterization. This theory shows that a uniform-normal-inverse-Wishart density over the ORF parameterization induces a normal-generalized-normal density over the structural parameterization.<br /><br />
To motivate their contribution, ARW first show that existing algorithms for SVARs identified only by sign restrictions, conditional on a sign restriction using the change of variable theory, operate on independent draws from the normal-generalized-normal distribution over the structural parameterization. These algorithms independently draw from the uniform-normal-inverse-Wishart distribution over the ORF parameterization and only accept draws that impose a sign restriction.<br /><br />
Next, ARW generalize these algorithms to also consider zero restrictions. The key to this generalization is that, conditional on the reduced-form parameters, the class of zero restrictions on the structural parameters maps to linear restrictions on the orthogonal matrices. The resulting approach generalization independently draws from normal-inverse-Wishart over the reduced-form parameters and from the set of orthogonal matrices such that the zero restrictions hold. In this regard, conditional on the zero restrictions, they show that this generalization does not induce a distribution over the structural parameterization from the family of normal-generalized-normal distributions. Furthermore, they derive the induced distribution and write an importance sampler that, conditional on the sign and zero restrictions, independently draws from normal-generalized-normal distributions over the structural parameterization.<br /><br />
To formalize these ideas, consider the SVAR with the general form:
\begin{align}
Y_t^{\prime} A_{0} = \sum_{i=1}^{p} Y_{t-i}^{\prime}A_{i} + c + \epsilon_t^{\prime}, \quad t=1, \ldots, T \label{eq1}
\end{align}
where $ Y_t $ is an $ n\times 1 $ vector of endogenous variables, $ A_i $ are parameter matrices of size of $ n\times n $ with $ A_{0} $ invertible, $ c $ is a $ 1\times n $ vector of parameters, $ \epsilon_t $ is an $ n\times 1 $ vector of exogenous structural shocks, $ p $ is the lag length, and $ T $ is the sample size.<br /><br />
We can also summarize equation \eqref{eq1} as follows:
\begin{align}
Y_{t}^{\prime}A_{0} = X_{t}^{\prime}A_{+} + \epsilon_{t}^{\prime} \label{eq2}
\end{align}
where $ A_{+}^{\prime} = \left[A_{1}^{\prime}, \ldots, A_{p}^{\prime}, c^{\prime}\right]$ and $ X_{t}^{\prime} = \left[Y_{t-1}^{\prime}, \ldots, Y_{t-p}^{\prime}, 1\right] $.<br /><br />
The reduced form can now be written as:
\begin{align}
Y_{t}^{\prime} = X_{t}^{\prime}B + u_{t}^{\prime} \label{eq3}
\end{align}
where $ B = A_{+}A_{0}^{-1}, u_{t}^{\prime} = \epsilon_{t}^{\prime}A_{0}^{-1} $, and $ E(u_{t}u_{t}^{\prime}) = \Sigma = \left(A_{0}A_{0}^{\prime}\right)^{-1} $. Naturally, $ B $ and $ \Sigma $ are the reduced form parameters.<br /><br />
We can further write equation \eqref{eq3} as the orthogonal reduced-form parameterization
\begin{align}
Y_{t}^{\prime} = X_{t}^{\prime}B + \epsilon_{t}^{\prime}Q^{\prime}h(\Sigma) \label{eq4}
\end{align}
where the $ n\times n $ matrix $ h(\Sigma) $ is the Cholesky decomposition of covariance matrix $ \Sigma $.<br /><br />
Given equations \eqref{eq2} and \eqref{eq4}, in addition to the Cholesky decomposition $ h $, we can define a mapping between $ \left(A_{0}, A_{+}\right) $ and $ (B, \Sigma, Q) $ by:
\begin{align}
f_{h}\left(A_{0}, A_{+}\right) = \left(A_{+}A_{0}^{-1}, \left(A_{0}A_{0}^{\prime}\right)^{-1}, h\left(\left(A_{0}A_{0}^{\prime}\right)^{-1}\right)A_{0}\right) \label{eq5}
\end{align}
where the first element of the triad on the right corresponds to $ B $, the second to $ \Sigma $, and the third to $ Q $.<br /><br />
Note further that the function $ f_{h} $ is invertible with inverse defined by:
\begin{align}
f_{h}^{-1} (B,\Sigma, Q) = \left(h(\Sigma)^{-1}Q, Bh(\Sigma)^{-1}Q\right) \label{eq6}
\end{align}
where the first term on the right corresponds to $ A_{0} $ and the second to $ A_{+} $.<br /><br />
Thus, the ORF parameterization makes clear how the structural parameters depend on the reduced form parameters and orthogonal matrices.<br /><br />
<h3 id="sec3">ARW Algorithms</h3>
Although ARW propose three different algorithms, the most important is in fact the third. The latter draws from a distribution over the ORF parameterization conditional on the sign and zero restriction and then transforms the draws into the structural parameterization. Since Algorithm 3 also depends on Algorithm 2, we present the latter here and recommend readers to refer to the supplementary materials of ARW (2018) if they require further details.<br /><br />
<h4>Algorithm 2</h4>
Let $ Z_j $ define the zero restriction matrix on the $ j^{\text{th}} $ structural shock, and let $ z_{j} $ denote the number of zero restrictions associated with the $ j^{\text{th}} $ structural shock. Then:
<ol>
<li>Draw $ (B, \Sigma) $ independently from Normal-inverse-Wishart distribution.
<li>For $ j \in \{1, \ldots, n\} $ draw $ X_{j} \in \mathbf{R}^{n+1-j-Z_{j}} $ independently from a standard normal distribution and set $ W_{j} = X_{j} / ||X_{j}||$.
<li>Define $ Q = [q_{1}, \ldots q_{n}] $ recursively as $ q_{j} = K_{j}W_{j} $ for any matrix $ K_{j} $ whose columns form an orthonormal basis for the null space of the $ (j-1+z_{j})\times n $ matrix
\begin{align}
M_{j} = \left[q_{1}, \ldots, q_{j-1},\left(Z_{j}F\left(f_{h}^{-1}(B, \Sigma, I_{n})\right)\right)\right]
\end{align}
<li>Set $ (A_{0},A_{+}) = f_{h}^{-1}(B,\Sigma,Q) $.<br /><br />
</ol>
<h4>Algorithm 3</h4>
Let $ \mathcal{Z} $ denote the set of all structural parameters that satisfy the zero restrictions, and define $ v_{(g^{\circ}f_{h})|\mathcal{Z}} $ as te volume element. Then:
<ol>
<li>Use Algorithm 2 to independently draw $ (A_{0}, A_{+}) $.
<li>If $ (A_{0}, A_{+}) $ satisfies the sign restrictions, set its importance weight to
$$ \frac{|\det(A_{0})|^{-(2n+m+1)}}{v_{(g^{\circ}f_{h})|\mathcal{Z}}(A_{0}A_{+})} $$
otherwise, set its importance weight to zero.
<li>Return to Step 1 until the required number of draws has been obtained.
<li>Re-sample with replacement using the importance weights.<br /><br />
</ol>
<h3 id="sec4">ARW EViews Add-in</h3>
Now we turn to the implementation of the ARW add-in. First, we need to download and install the add-in from the EViews website. The latter can be found at <a href="https://www.eviews.com/Addins/arw.aipz">https://www.eviews.com/Addins/arw.aipz</a>. We can also do this from inside EViews itself. In particular, after opening EViews, click on <b>Add-ins</b> from the main menu, and click on <b>Download Add-ins...</b>. From here, locate the <i>ARW</i> add-in and click on <b>Install</b>.<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/arw/addin_download.png"><img
height="auto" src="http://www.eviews.com/blog/arw/addin_download.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 1: Add-in installation</small><br /><br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
After installing, we open the data file named as <i>data.WF1</i> which can be found in the installation folder, typically located in <b>[Windows User Folder]/Documents/EViews Addins/ARW</b>.<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/arw/workfile.png"><img
height="auto" src="http://www.eviews.com/blog/arw/workfile.png"
title="ARW (2018) Data" width="360" /></a><br />
<small>Figure 2: ARW (2018) Data</small><br /><br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
We now replicate Figures 1 and Table 3 from ARW. We can of course do this in EViews as follows.<br /><br />
<ol>
<li>Click on the <b>Add-ins</b> menu item in the main EViews menu, and click on <b>Sign restricted VAR</b>.
<li>Under <b>Endogenous variables</b> enter <i>tfp stock cons ffr hour</i>.
<li>Check the <b>Include constant</b> option.
<li>Under <b>Number of lags</b>, enter <i>4</i>.
<li>In the <b>Sign restriction vector</b> textbox enter <i>+2</i>.
<li>Under <b>Sign restriction method</b> check <i>Penalty</i>.
<li>In the <b>Number of horizons</b> enter <i>40</i></li>
<li>Under <b>Zero restriction</b> textbox enter <i>tfp</i>.
<li>Check the <b>variance decomposition box</b>.
<li>Hit <b>OK</b>.<br /><br />
</ol>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/arw/pfa.png"><img
height="auto" src="http://www.eviews.com/blog/arw/pfa.png"
title="SRVAR Add-in (PFA)" width="360" /></a><br />
<small>Figure 3: SRVAR Add-in (PFA)</small><br /><br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
The steps above produce the following output (Panel A of Figure 1 of ARW):<br /><br />
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/arw/panela.png"><img
height="auto" src="http://www.eviews.com/blog/arw/panela.png"
title="PFA Output" width="360" /></a><br />
<small>Figure 4: PFA Output</small><br /><br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Next, we invoke the ARW add-in and proceed with the ARW Algorithm 3.<br /><br />
<ol>
<li>Click on the <b>Add-ins</b> menu item in the main EViews menu, and click on <b>Sign and zero restricted VAR</b>.
<li>Under <b>Endogenous variables</b> enter <i>tfp stock cons ffr hour</i>.
<li>Check the <b>Include constant</b> option.
<li>Under <b>Number of lags</b>, enter <i>4</i>.
<li>In the <b>Sign restriction vector</b> textbox enter <i>+stock</i>.
<li>In the <b>Zero restrictions</b> textbox enter <i>tfp</i>.
<li>Under<b>Number of steps</b> enter <i>40</i>.
<li>Check the <b>variance decomposition box</b>.
<li>Hit <b>OK</b>.<br /><br />
</ol>
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/arw/isampler.png"><img
height="auto" src="http://www.eviews.com/blog/arw/isampler.png"
title="ARW Add-in (Importance Sampler)" width="360" /></a><br />
<small>Figure 5: ARW Add-in (Importance Sampler)</small><br /><br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
The steps above produce the following output (Panel B of Figure 1 of ARW):<br /><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/arw/panelb.png"><img
height="auto" src="http://www.eviews.com/blog/arw/panelb.png"
title="Importance Sampler Output" width="360" /></a><br />
<small>Figure 6: Importance Sampler Output</small><br /><br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
Figures 5 and 6 above illustrates the IRFs using the PFA and importance sampler methods, respectively. In case of the former, we can see the IRFs with probability bands for adjusted TFP, stock prices, consumption, real interest rate,and hours worked under the PFA. Examining the confidence bands around IRFs allows us to conclude that optimism shocks boost consumption and hours worked, as the corresponding IRFs do not contain a zero for at least 20 quarters.<br /><br />
Alternatively, the IRFs of the same variables obtained using the importance sampler yield a different result. For consumption and hours worked, the confidence bands are wider and contain zero. Furthermore, the corresponding point-wise median IRFs are closer to zero compared to those obtained using the PFA. This shows that the PFA exaggerates the effects of optimism shocks on stock prices, consumption, and hours worked, by generating much narrower confidence bands and larger point-wise median IRFs. In this regard, as pointed out by Uhlig (2005), we can see that the PFA includes additional identification restrictions when implementing sign and zero restrictions.<br /><br />
To further summarize the results, we present the table below which gives the specifics of the output figures above.<br /><br />
<center>
<table style="width:100%">
<tr>
<th></th>
<th colspan="3">Penalty Function Approach</th>
<th colspan="3">Importance Sampler</th>
</tr>
<tr>
<td>Adjusted TFP</td>
<td>0.07</td>
<td><b>0.17</b></td>
<td>0.29</td>
<td>0.03</td>
<td><b>0.11</b></td>
<td>0.23</td>
</tr>
<tr>
<td>Stock Prices</td>
<td>0.54</td>
<td><b>0.72</b></td>
<td>0.84</td>
<td>0.05</td>
<td><b>0.29</b></td>
<td>0.57</td>
</tr>
<tr>
<td>Consumption</td>
<td>0.13</td>
<td><b>0.27</b></td>
<td>0.43</td>
<td>0.03</td>
<td><b>0.17</b></td>
<td>0.50</td>
</tr>
<tr>
<td>Real Interest Rate</td>
<td>0.07</td>
<td><b>0.14</b></td>
<td>0.23</td>
<td>0.08</td>
<td><b>0.20</b></td>
<td>0.39</td>
</tr>
<tr>
<td>Hours Worked</td>
<td>0.20</td>
<td><b>0.31</b></td>
<td>0.45</td>
<td>0.04</td>
<td><b>0.18</b></td>
<td>0.56</td>
</tr>
</table>
<small>Table I: Forecast Error Variance Decomposition (FEVD)</small><br /><br />
</center>
Table I shows the contribution of shocks to the Forecast Error Variance Decomposition (FEVD) using the PFA and the importance sampler for the chosen horizon of 40 periods and 68 percent equal-tailed probability intervals. Under the PFA, the share of FEVD attributable to optimism shocks of consumption and hours worked is 27 and 31 percent, respectively. However, the contribution of optimism shocks to the FEVD of stock prices is 72 percent under the PFA in contrast to 29 percent using the importance sampler. It should be noted that for most variables, when using the importance sampler, optimism shocks contribute less to the FEVD, and probability intervals for the FEVD are broader as opposed to those obtained under the PFA.<br /><br />
<h3 id="sec5">Conclusion</h3>
In this blog entry we presented the ARW add-in for EViews. The add-in is based on the work of ARW (2018) and generates impulse response curves based on the importance sampler which accommodates both sign and zero restrictions in the VAR model.<br /><br />
<hr />
<h3 id="sec6">References</h3>
<ol class="bib2xhtml">
<li><a name="arias-2018"></a>Arias J., Rubio-Ramirez J., and Waggoner D.:
Inference Based on SVARs Identified with Sign and Zero Restrictions: Theory and Applications
<cite>Econometrica</cite>, 86:685–720, 2018.
</li>
<li><a name="beaudry-2011"></a>Beaudry P., Nam D., and Wang J.:
Do mood swing drive business cycle and is it rational?
<cite>NBER Working Paper 17651</cite>, 2011.
</li>
<li><a name="mountford-2009"></a>Mountford A. and Uhlig H.:
What are the effects of fiscal policy shocks?
<cite>Journal of Applied Econometrics</cite>, 24:960–992, 2009.
</li>
<li><a name="uhlig-2005"></a>Uhlig H.:
What are the effects of monetary policy on output? Results from an agnostic identification procedure.
<cite>Journal of Monetary Economics</cite>, 52(2):381–419, 2005.
</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com5tag:blogger.com,1999:blog-6883247404678549489.post-44432315972472118622019-11-06T10:23:00.000-08:002019-11-06T13:02:01.360-08:00Dealing with the log of zero in regression models<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
TeX: {
equationNumbers: { autoNumber: "AMS" },
extensions: ["AMSmath.js"],
Macros: {
lb: "{\\left(}",
rb: "{\\right)}",
bu: ['{\\underline{#1}}', 1],
ba: ['{\\overline{#1}}', 1],
norm: ['{\\lVert#1\\rVert}', 1]
}
}
});
</script>
<script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript">
</script>
<span style="font-family: "verdana" sans-serif">
<i>Author and guest post by Eren Ocakverdi</i><br /><br />
The title of this blog piece is a verbatim excerpt from the Bellego and Pape (2019) paper suggested by Professor David E. Giles in his <a href="https://davegiles.blogspot.com/2019/10/october-reading.html">October reading list</a>. (Editor's note: Professor Giles has recently announced the end of his blog - it is a fantastic resource and will be missed!). The topic is immediately familiar to practitioners who occasionally encounter the difficulty in applied work. In this regard, it is reassuring that the frustration is being addressed and that there is indeed an ongoing quest for the <i>silver bullet</i>.<a name='more'></a><br /><br />
<h3>Table of Contents</h3>
<ol>
<li><a href="#sec1">Introduction</a>
<li><a href="#sec2">A Novel Approach</a>
<li><a href="#sec3">Files</a>
<li><a href="#sec4">References</a>
</ol><br />
<h3 id="sec1">Introduction</h3>
Consider the following data generating process where the dependent variable may contain zeros:
$$ \log(y_i) = \alpha + x_i^\prime \beta + \epsilon_i \quad \text{with} \quad E(\epsilon_i)=0 $$
The most common remedy to the <i>logarithm of zero value</i> problem among practitioners is to add a common (observation independent) positive constant to the problematic observations. In other words, to work with the model:
$$ \log(y_i + \Delta) = \alpha + x_i^\prime \beta + \omega_i $$
where $ \Delta $ is the corrective constant.<br /><br />
In the aforementioned paper, the authors use Monte Carlo simulations to demonstrate that the bias incurred by this correction is not necessarily negligible for small values of $ \Delta $, and in fact, may be substantial.<br /><br />
<!-- :::::::::: FIGURE 1 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/log_of_zero/bias.png"><img
height="auto" src="http://www.eviews.com/blog/log_of_zero/bias.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 1: Estimation bias as a function of $ \Delta $ </small><br /><br />
</center>
<!-- :::::::::: FIGURE 1 :::::::::: -->
In order to handle the zeros in model variables, the paper offers a new (complementary) solution that:
<ol>
<li>Does not generate computational bias by arbitrary normalization. </li>
<li>Does not generate correlation between the error term and regressors. </li>
<li>Does not require the deletion of observation.</li>
<li>Does not require the estimation of a supplementary parameter.</li>
<li>Does not require addition of a discretionary constant.</li><br /><br />
</ol>
<h3 id="sec2">A Novel Approach</h3>
Bellego and Pape (2019) suggest that instead of adding a common positive constant $ \Delta $, one ought to add some optimal, observation-dependent positive value $ \Delta_{i} $. The novel strategy results in the following model and is estimated via GMM:
$$ \log(y_i + \Delta_{i}) = \alpha + x_i^\prime \beta + \eta_{i} $$
where $ \Delta_i = \exp(x_i^\prime \beta) $ and $ \eta_i = \log(1 + \exp(\alpha + \epsilon_i)) $.<br /><br />
Since the details can be referred to in the original paper, here I’d like to replicate the simulation exercise in which the authors illustrate their method and make a comparison with other approaches. (The tables below can be replicated in EViews by running the program file <i>loglinear.prg</i>.)<br /><br />
<!-- :::::::::: FIGURE 2 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/log_of_zero/table1.png"><img
height="auto" src="http://www.eviews.com/blog/log_of_zero/table1.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 2: Output of OLS estimation (with $ \Delta = 1 $)</small><br /><br />
</center>
<!-- :::::::::: FIGURE 2 :::::::::: -->
<!-- :::::::::: FIGURE 3 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/log_of_zero/table2.png"><img
height="auto" src="http://www.eviews.com/blog/log_of_zero/table2.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 3: Output of Pseudo Poissson Maximum Likelihood (PPML) estimation</small><br /><br />
</center>
<!-- :::::::::: FIGURE 3 :::::::::: -->
<!-- :::::::::: FIGURE 4 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/log_of_zero/table3.png"><img
height="auto" src="http://www.eviews.com/blog/log_of_zero/table3.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 4: Output of proposed solution (GMM estimation)</small><br /><br />
</center>
<!-- :::::::::: FIGURE 4 :::::::::: -->
Simulation results show that both the PPML and the GMM solutions provide correct estimates (i.e. $ \alpha = 0 $ , $ \beta_{1} = \beta_{2} = 1 $), whereas OLS results are biased due to adding a common constant to all data points. Although $ \alpha $ is not identified in the proposed solution, the authors suggest OLS estimation to obtain the coefficient:<br /><br />
<!-- :::::::::: FIGURE 5 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/log_of_zero/table4.png"><img
height="auto" src="http://www.eviews.com/blog/log_of_zero/table4.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 5: OLS estimation of alpha parameter: $ \log(\exp(\eta_i)-1)=\alpha+\epsilon_i $</small><br /><br />
</center>
<!-- :::::::::: FIGURE 5 :::::::::: -->
When zeros are observed in both the dependent and independent variables, the authors suggest a functional coefficient model of the form:
$$ \log(y_i) = \alpha + \mathbb{1}_{x_i > 0}\times\log(x_i)\beta_{x_i>0}+\mathbb{1}_{x_i=0}\times\beta_{x_i=0}+\epsilon_i $$
Again, a simulation exercise is carried out to compare the estimated coefficients with different methods. (The tables below can be reproduced in EViews by running the program <i>loglog.prg</i>.)<br /><br />
<!-- :::::::::: FIGURE 6 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/log_of_zero/table5.png"><img
height="auto" src="http://www.eviews.com/blog/log_of_zero/table5.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 6: OLS estimation</small><br /><br />
</center>
<!-- :::::::::: FIGURE 6 :::::::::: -->
<!-- :::::::::: FIGURE 7 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/log_of_zero/table6.png"><img
height="auto" src="http://www.eviews.com/blog/log_of_zero/table6.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 7: PPML estimation</small><br /><br />
</center>
<!-- :::::::::: FIGURE 7 :::::::::: -->
<!-- :::::::::: FIGURE 8 :::::::::: -->
<center>
<a href="http://www.eviews.com/blog/log_of_zero/table7.png"><img
height="auto" src="http://www.eviews.com/blog/log_of_zero/table7.png"
title="Add-ins Download" width="360" /></a><br />
<small>Figure 8: GMM estimation</small><br /><br />
</center>
<!-- :::::::::: FIGURE 8 :::::::::: -->
Simulation results show that the suggested (flexible) formulation of the $ \beta $ coefficients works well for all estimation methods ($ \alpha=0 $ and $ \beta = 1.5 $).<br /><br />
<hr />
<h3 id="sec3">Files</h3>
<ol>
<li><a href="http://www.eviews.com/blog/log_of_zero/deltasimul.prg">deltasimul.prg</a>
<li><a href="http://www.eviews.com/blog/log_of_zero/loglinear.prg">loglinear.prg</a>
<li><a href="http://www.eviews.com/blog/log_of_zero/loglog.prg">loglog.prg</a>
</ol><br />
<hr />
<h3 id="sec4">References</h3>
<ol class="bib2xhtml">
<!-- Authors: Bellego and Paper (2019) -->
<li><a name="bellego_pape-2019"></a>Bellego, C. and L-D. Pape.
Dealing with the log of zero in regression models.
<cite>CREST: Working Paper</cite>, No:2019-13, 2019.</li>
</ol>
</span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com1