tag:blogger.com,1999:blog-68832474046785494892020-11-21T21:20:30.167-08:00EViewsIHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.comBlogger44125tag:blogger.com,1999:blog-6883247404678549489.post-6385533183570778612020-07-16T09:48:00.003-07:002020-07-17T07:32:26.402-07:00Time Series Methods for Modelling the Spread of Epidemics<style> table { border: 0px solid black; border-collapse: separate; border-spacing: 10px; } td { border: 1px solid black; } .nb { border: 0px solid black; } .step { counter-reset: section; list-style-type: none; } .step li::before { counter-increment: section; content: "Step "counter(section) ": "; } </style> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"> </script> <span style="font-family: &quot;verdana&quot; sans-serif"> <i>Authors and guest post by Eren Ocakverdi</i><br /><br /> This blog piece intends to introduce two new add-ins (i.e. <a href='http://www.eviews.com/Addins/seirmodel.aipz'>SEIRMODEL</a> and <a href='http://www.eviews.com/Addins/tsepigrowth.aipz'>TSEPIGROWTH</a>) to EViews users’ toolbox and help close the gap between epidemiological models and time series methods from a practitioner’s point of view. <a name='more'></a><br /><br /> <h3>Table of Contents</h3> <ol> <li><a href="#sec1">Introduction</a> <li><a href="#sec2">Susceptible-Exposed-Infected-Recovered (SEIR) model</a> <li><a href="#sec3">Observational Models</a> <li><a href="#sec4">Application to COVID-19 Data from Turkey</a> <li><a href="#sec5">Files</a> <li><a href="#sec6">References</a> </ol><br /> <h3 id="sec1">Introduction</h3> Spread of infectious diseases are usually described through compartmental models in mathematical epidemiology instead of observational time series models since analytical derivation of their dynamics are quite straightforward. These are merely structural models that divide the population into several states and then define the equations that govern the transition behavior from one state to another. In other words, <i>state space</i> models.<br /><br /> <h3 id="sec2">Susceptible-Exposed-Infected-Recovered (SEIR) model</h3> I have written an add-in (<a href='http://www.eviews.com/Addins/seirmodel.aipz'>SEIRMODEL</a>) for interested EViews users, who would want to carry out their own analyses and gain basic insights into the systemic nature of an epidemic. The add-in implements a deterministic version of the SEIR model, which does not take into account vital dynamics like birth and death. Still, it offers a simplified framework for those who are not familiar with these concepts.<br /><br /> In order to run simulations, users need to provide required inputs (e.g. population size, calibration parameters, initial conditions etc.), details of which can be found in the documentation file that comes with the add-in:<br /><br /> <!-- :::::::::: FIGURE 1 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/tsepigrowth/seir_dialog.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/seir_dialog.png" title="SEIR Add-In Dialog" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 1: SEIR Add-In Dialog</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 1 :::::::::: --> The default output is a chart showing the evolution of compartments/states during the spread of the epidemic. You can also save these series for further analysis.<br/><br/> <!-- :::::::::: FIGURE 2 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/tsepigrowth/seir_output.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/seir_output.png" title="SEIR Add-In: Output" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 2: SEIR Add-In Output</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 2 :::::::::: --> <h3 id="sec3">Observational Models</h3> Structural modelling of epidemics becomes increasingly complex when the heterogeneity in the population, mobility issues, interactions, etc. are considered in the computations. Functions fitted to observed data for calibration purposes are mostly nonlinear, which can further complicate the estimation process. Harvey and Kuttman (2020) recently proposed useful observational time series methods particularly for generalized logistic and Gompertz growth curves. I have written an add-in (<a href='http://www.eviews.com/Addins/tsepigrowth.aipz'>TSEPIGROWTH</a>) that implements those methods outlined in the paper.<br/><br/> Suppose we wanted to fit these nonlinear curves to the number of infected individuals from the simulation of our earlier SEIR model:<br /><br /> <!-- :::::::::: FIGURES 3a and 3b :::::::::: --> <center> <table> <tr> <td> <!-- :::::::::: FIGURE 3a :::::::::: --> <center> <a href="http://www.eviews.com/blog/tsepigrowth/seir_logistic.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/seir_logistic.png" title="SEIR: Generalized Logistic Fit" width="360" /></a><br /> </center> </td> <td> <!-- :::::::::: FIGURE 3b :::::::::: --> <center> <a href="http://www.eviews.com/blog/tsepigrowth/seir_gompertz.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/seir_gompertz.png" title="SEIR: Gompertz Growth Curve Fit" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 3a: SEIR: Generalized Logistic Fit</small> </center> </td> <td class="nb"> <center> <small>Figure 3b: SEIR: Gompertz Growth Curve Fit</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURES 3a and 3b :::::::::: --> Above, c(4) denotes the growth rate parameter. At this point I would also suggest EViews users to try the <a href="http://www.eviews.com/Addins/GBASS.aipz">GBASS</a> add-in, which incorporates the generalized BASS model developed for modelling how new products (or new viruses for that matter!) get adopted into a population.<br /><br /> If we wanted to take the other venue offered by Harvey and Kuttman (2020) and estimate these parameters via observational methods, then we could simply run the add-in:<br /><br /> <!-- :::::::::: FIGURE 4 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_dialog.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_dialog.png" title="TSEPIGROWTH Add-In: Dialog" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 4: TSEPIGROWTH Add-In Dialog</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 4 :::::::::: --> Output from the state space specification of these models are as follows:<br /><br /> <!-- :::::::::: FIGURES 5a and 5b :::::::::: --> <center> <table> <tr> <td> <!-- :::::::::: FIGURE 3a :::::::::: --> <center> <a href="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_logistic_ss.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_logistic_ss.png" title="TSEPIGROWTH: Generalized Logistic SS Model" width="360" /></a><br /> </center> </td> <td> <!-- :::::::::: FIGURE 3b :::::::::: --> <center> <a href="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_gompertz_ss.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_gompertz_ss.png" title="TSEPIGROWTH: Gompertz Growth Curve SS Model" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 5a: TSEPIGROWTH: Generalized Logistic SS Model</small> </center> </td> <td class="nb"> <center> <small>Figure 5b: TSEPIGROWTH: Gompertz Growth Curve SS Model</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURES 5a and 5b :::::::::: --> Here, the final value of the state variable <i>CHANGE</i>, corresponds to the growth rate parameter and is more or less close to that of fitted nonlinear curves.<br/><br/> <h3 id="sec4">Application to COVID-19 Data From Turkey</h3> Examples above may be important or useful from a pedagogical point of view, but we need to try these models on actual data to gain more insight from a practical perspective. Naturally, COVID-19 data would be the most recent and most appropriate place to start. Users can visit the <a href='http://blog.eviews.com/2020/03/mapping-covid-19.html'>previous blog post</a> to learn how to fetch COVID-19 data from various sources. Here, I’ll use another data source provided by the WHO.<br /><br /> First, we fit a Gompertz curve to the level and make forecasts until the end of year. Next, we do the same exercise with the observational counterparts of the Gompertz model that focus on estimation of the growth rate.<br /><br /> The chart below visually compares the fitted values of growth:<br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/tsepigrowth/grfit.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/grfit.png" title="Gompertz Fit Curves" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 6: Gompertz Fit Curves</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 6 :::::::::: --> The next plot displays the forecasted values for the level: <!-- :::::::::: FIGURE 7 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/tsepigrowth/grfcast.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/grfcast.png" title="Gompertz Forecast Curves" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 7: Gompertz Forecast Curves</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 7 :::::::::: --> These forecasts indicate different saturation levels, of which the nonlinear curve is the lowest. This is mainly because the inflection point of the fitted nonlinear curve implies levelling off at an earlier date. The first observational model has a deterministic trend, but performs better since it focuses on the growth rate. There is an obvious change in trend at the beginning of June as Turkey then announced the first phase of COVID-19 restriction easing and marked the start of the normalization process. Observational models allow us to model this change explicitly as a slope intervention: <!-- :::::::::: FIGURE 8 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/tsepigrowth/policyss.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/policyss.png" title="Policy Intervention SS Model" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 8: Policy Intervention SS Model</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 8 :::::::::: --> The coefficient <i>C(3)</i> verifies that the growth rate has risen significantly as of June. Dynamic versions of the observational model of Gompertz fits a flexible trend to data so it adapts to changes in growth rates without any need for explicit modelling of the intervention. It also allows the analysis of the impact of policy/intervention from a counterfactual perspective. The plot below compares the out-of-sample forecasts of the dynamic model before and after the normalization period. The shift in the forecasted level of total cases is obvious! <!-- :::::::::: FIGURE 9 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/tsepigrowth/policygrfcast.png"><img height="auto" src="http://www.eviews.com/blog/tsepigrowth/policygrfcast.png" title="Policy Intervention Out of Sample Forecast" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 9: Policy Intervention Out of Sample Forecast</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 9 :::::::::: --> <h3 id="sec5">Files</h3> <ul> <li><a href="http://www.eviews.com/blog/tsepigrowth/tsepigrowth_blog.prg">tsepigrowth_blog.prg</a> </ul> <br /><br /> <hr /> <h3 id="sec6">References</h3> <ol class="bib2xhtml"> <li><a name="harvey-2020"></a>Harvey, A. C. and Kattuman, P.: Time Series Models Based on Growth Curves with Applications to Forecasting Coronavirus <cite>Covid Economics: Vetted and Real-Time Papers</cite>, 24(1) 126&#x2013;157, 2020. </li> </ol></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com2tag:blogger.com,1999:blog-6883247404678549489.post-33436183732368941762020-04-01T06:44:00.000-07:002020-04-01T06:44:12.331-07:00Mapping COVID-19: Follow-up<style> table { border: 0px solid black; border-collapse: separate; border-spacing: 10px; } td { border: 1px solid black; } .nb { border: 0px solid black; } .step { counter-reset: section; list-style-type: none; } .step li::before { counter-increment: section; content: "Step "counter(section) ": "; } </style> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"> </script> <span style="font-family: &quot;verdana&quot; sans-serif"> As a follow up to our <a href="http://blog.eviews.com/2020/03/mapping-covid-19.html">previous blog entry</a> describing how to import Covid-19 data into EViews and produce some maps/graphs of the data, this post will produce a couple more graphs similar to ones we've seen become popular across social media in recent days. <a name='more'></a><br /><br /> <h3>Table of Contents</h3> <ol> <li><a href="#sec1">Deaths Since First Death</a> <li><a href="#sec2">One Week Difference</a> </ol><br /> <h3 id="sec1">Deaths Since First Death</h3> The first is a graph showing the 3 day moving average of the number of deaths per day since the first death was recorded in a country, for countries with a current number of deaths greater than 160:<br /><br /> <!-- :::::::::: FIGURE 1 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/3dma.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/3dma.png" title="3-Day moving average" width="480" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 1: 3-Day moving average</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 1 :::::::::: --> The graph shows that for most countries the growth rate of deaths (approximated by using log-scaling) is increasing, but at a slower rate. The code to produce this graph, including importing the death data from Johns Hopkins is:<br /><br /> <pre style="overflow:auto"><br /> <font color="green">'import the death data from Johns Hopkins</font><br /> %url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"<br /> <br /> <font color="green">'load up the url as a new page</font><br /> pageload(page=temp) {%url}<br /> <br /> <font color="green">'stack the page into a 2d panel</font><br /> pagestack(page=stack) _? @ *? * <br /> <br /> <font color="green">'do some renaming and make the date series</font><br /> rename country_region country <br /> rename province_state province<br /> rename _ deaths<br /> series date = @dateval(var01, "MM_DD_YYYY")<br /> <br /> <font color="green">'structure the page </font><br /> pagestruct province country @date(date)<br /> <br /> <font color="green">'delete the original page</font><br /> pagedelete temp<br /> <br /> <font color="green">'create the panel page</font><br /> pagecreate(id, page=panel) country @date @srcpage stack<br /> <br /> <font color="green">'copy the deaths series to the panel page</font><br /> copy(c=sum) stack\deaths * @src @date country @dest @date country<br /> pagedelete stack<br /> <br /> <font color="green">'contract the page to only include countries with greater than 160 deaths</font><br /> pagecontract if @maxsby(deaths,country)>160<br /> <br /> <font color="green">'create a series containing the number of days since the first death was recorded in each country. This series is equal to 0 if the number of deaths on a date is equal to the minimum number of deaths for that country (nearly always 0, but for China, the data starts after the first recorded death), and then counts up by one for dates after the minimum.</font><br /> series days = @recode(deaths=@minsby(deaths,country), 0, days(-1)+1)<br /> <br /> <font color="green">'contract the page so that days before the second recorded death in each country are removed</font><br /> pagecontract if days>0<br /> <br /> <font color="green">'restructure the page to be based on this day count rather than actual dates</font><br /> pagestruct(freq=u) @date(days) country<br /> <br /> <font color="green">'set sample to be first 45 days</font><br /> smpl 1 45<br /> <br /> <font color="green">'make a graph of the 3 day moving average of deaths</font><br /> freeze(d_graph) @movav(log(deaths),3).line(m, panel=c)<br /> d_graph.addtext(t, just(c)) Deaths Since First Death\n(3 day moving average, log scale)<br /> d_graph.addtext(br) Days<br /> d_graph.addtext(l) log(deaths)<br /> d_graph.legend columns(5)<br /> d_graph.legend position(-0.6,3.72)<br /> show d_graph<br /> </pre> <h3 id="sec2">One Week Difference</h3> The second graph is an interesting approach plotting the one-week difference in the number of new confirmed cases of COVID-19 against the total number of confirmed cases for each country, with both shown using log-scales. We have only included countries with more than 140 deaths, and have highlighted just three countries – China, South Korea and the US.<br /><br /> <!-- :::::::::: FIGURE 2 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/weekdiff.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/weekdiff.png" title="One week difference" width="480" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 2: One week difference</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 2 :::::::::: --> The code to generate this graph is:<br /><br /> <pre style="overflow:auto"><br /> <font color="green">'names of the three topics/files</font><br /> %topics = "confirmed deaths recovered"<br /><br /> <font color="green">'loop through the topics</font><br /> for %topic {%topics}<br /> <br /> <font color="green">'build the url by taking the base url and then adding the topic in the middle</font><br /> %url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_" + %topic + "_global.csv"<br /> <br /> <font color="green">'load up the url as a new page</font><br /> pageload(page=temp) {%url}<br /> <br /> <font color="green">'stack the page into a 2d panel</font><br /> pagestack(page=stack_{%topic}) _? @ *? *<br /> <br /> <font color="green">'do some renaming and make the date series</font><br /> rename country_region country <br /> rename province_state province<br /> rename _ {%topic}<br /> series date = @dateval(var01, "MM_DD_YYYY")<br /> <br /> <font color="green">'structure the page</font><br /> pagestruct province country @date(date)<br /> <br /> <font color="green">'delete the original page</font><br /> pagedelete temp<br /> next<br /> <br /> <font color="green">'create the panel page</font><br /> pagecreate(id, page=panel) country @date @srcpage stack_{%topic}<br /> <br /> <font color="green">'loop through the topics copying each from the 2D panel</font><br /> for %topic {%topics}<br /> copy(c=sum) stack_{%topic}\{%topic} * @src @date country @dest @date country<br /> pagedelete stack_{%topic}<br /> next<br /> <br /> <font color="green">'contract the page to only include countries with more than 140 deaths</font><br /> pagecontract if @maxsby(deaths, country)>140<br /> <br /> <font color="green">'make a group, called DATA, containing confirmed cases and the one week difference in confirmed cases</font><br /> group data confirmed confirmed-confirmed(-7)<br /> <br /> <font color="green">'set the sample to remove periods with fewer than 50 cases</font><br /> smpl if confirmed > 50<br /> <br /> <font color="green">'produce a panel plot of confirmed against 7 day difference in confirmed</font><br /> freeze(c_graph) data.xyline(panel=c)<br /> <br /> <font color="green">' Add titles</font><br /> c_graph.addtext(t) "COVID-19: New vs. Total Cases\n(Countries with >140 deaths)"<br /> c_graph.addtext(bc, just(c)) "Total Confirmed Cases\n(log scale)"<br /> c_graph.addtext(l, just(c))"New Confirmed Cases (in the past week)\n(log scale)"<br /> c_graph.setelem(1) legend("")<br /> <br /> <font color="green">' Adjust axis to use logs</font><br /> c_graph.axis(b) log<br /> c_graph.axis(l) log<br /> <br /> <font color="green">' Adjust lines - remove lines after this if you want to show all countries</font><br /> c_graph.legend -display<br /> for !i = 1 to @rows(@uniquevals(country))<br /> c_graph.setelem(!i) linewidth(.75) linecolor(@rgb(192,192,192))<br /> next<br /><br /> c_graph.setelem(8) linecolor(@rgb(128,64,0))<br /> c_graph.setelem(3) linecolor(@rgb(0,64,128))<br /> c_graph.setelem(15) linecolor(@rgb(0,128,0))<br /> <br /> <font color="green">'add some text</font><br /> c_graph.addtext(3.29, 1.92, font(Calibri,10)) "S. Korea"<br /> c_graph.addtext(4.87, 2.35, font(Calibri,10)) "China"<br /> c_graph.addtext(5.31, 0.23, font(Calibri,10)) "United States"<br /> <br /> show c_graph<br /> </pre></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com8tag:blogger.com,1999:blog-6883247404678549489.post-11407133675307775742020-03-30T17:28:00.001-07:002020-04-01T07:55:06.046-07:00Mapping COVID-19<style> table { border: 0px solid black; border-collapse: separate; border-spacing: 10px; } td { border: 1px solid black; } .nb { border: 0px solid black; } .step { counter-reset: section; list-style-type: none; } .step li::before { counter-increment: section; content: "Step "counter(section) ": "; } </style> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"> </script> <span style="font-family: &quot;verdana&quot; sans-serif"> With the world currently experiencing the Covid-19 crisis, many of our users are working remotely (aside: for details on how to use EViews at home, visit our <a href="http://www.eviews.com/covid">Covid licensing page</a>) anxious to follow data on how the virus is spreading across parts of the world. There are many sources of information on Covid-19, and we thought we’d demonstrate how to fetch some of these sources directly into EViews, and then display some graphics of the data. (Please visit our <a href="http://blog.eviews.com/2020/04/mapping-covid-19-follow-up.html">follow up post</a> for a few more graph examples). <a name='more'></a><br /><br /> <h3>Table of Contents</h3> <ol> <li><a href="#sec1">Johns Hopkins Data</a> <li><a href="#sec2">European Centre for Disease Prevention and Control Data</a> <li><a href="#sec3">New York Times US County Data</a> <li><a href="#sec4">Sneak Peaks</a> </ol><br /> <h3 id="sec1">Johns Hopkins Data</h3> To begin we'll retrieve data from the Covid-19 Time Series collection from <a href="https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series">Johns Hopkins Whiting School of Engineering Center for Systems Science and Engineering</a>. These data are organized into three csv files, one containing confirmed cases, on containing deaths, and one recoveries at both country and state/province levels. Each file is organized such that the first column contains state/province name (where applicable), the second column the country name, the third and fourth contain average latitude and longitude, and then the remaining columns containing daily values.<br /><br /> There are a number of different approaches that could be used to import these data into an EViews workfile. We’ll demonstrate an approach that will stack the data into a single panel workfile. We’ll start with importing the confirmed cases data. EViews is able to directly open CSV files over the web using the <b>File->Open->Foreign Data as Workfile</b> menu item:<br /><br /> <!-- :::::::::: FIGURE 1 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhopenpath.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhopenpath.png" title="JH open path" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 1: JH open path</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 1 :::::::::: --> Which results in the following workfile:<br /><br /> <!-- :::::::::: FIGURE 2 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhwf.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhwf.png" title="JH workfile" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 2: JH workfile</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 2 :::::::::: --> Each day of data has been imported into its own series, with the name of the series being the date. There are also series containing the country/region name and the province/state name, as well as latitude and longitude.<br /><br /> To create a panel, we’ll want to stack these date series into a single series, which we can do simply with the <b>Proc->Reshape Current Page->Stack in New Page…</b><br /><br /> <!-- :::::::::: FIGURE 3 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhstackdialog.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhstackdialog.png" title="JH stack data dialog" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 3: JH stack data dialog</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 3 :::::::::: --> Since all of the series we wish to stack have a similar naming structure – they all start with an “_” we can instruct EViews to stack using “_?” as the identifier, where ? is a wildcard. This results in the following stacked workfile page:<br /><br /> <!-- :::::::::: FIGURE 4 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhstackwf.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhstackwf.png" title="JH stack data workfile" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 4: JH stack data workfile</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 4 :::::::::: --> Which is close to what we want, we simply need to tidy up some of the variable names, and instruct EViews to structure the page as a true panel. The date information has been imported into the alpha series VAR01, which we can convert into a true date series with:<br /><br /> <pre style="overflow:auto"><br /> series date = @dateval(var01, "MM_DD_YYYY")<br /> </pre> The actual cases data is stored in the series currently named "_", which we can rename to something more meaningful with:<br /><br /> <pre style="overflow:auto"><br /> rename _ cases<br /> </pre> And then finally we can structure the page as a panel by clicking on <b>Proc->Structure/Resize</b> current page, selecting Dated Panel as the structure type and filling in the date and filling in the cross-section and date information:<br /><br /> <!-- :::::::::: FIGURE 5 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhstructuredialog.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhstructuredialog.png" title="JH workfile restructure" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 5: JH workfile restructure</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 5 :::::::::: --> When asked if we wish to remove blank values, we select no. We now have a 2-dimensional panel, with two sets of cross-sectional identifiers – one for province/state and the other for country:<br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jh3dpanel.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jh3dpanel.png" title="JH 2D Panel" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 6: JH 2D Panel</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 6 :::::::::: --> If we want to sum up the state level data to create a traditional panel with just country and time, we can do so by creating a new panel page based upon the indices of this page. Click on the <b>New Page</b> tab at the bottom of the workfile and select <b>Specify by Identifier Series</b>. In the resulting dialog we enter the country series as the cross-section identifier we wish to keep:<br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhpagebyid.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhpagebyid.png" title="JH page by ID" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 6: JH page by ID</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 6 :::::::::: --> Which results in a panel. We can then copy the cases series from our 2D panel page to the new panel page with standard copy and paste, but ensuring to change the Contraction method to Sum in the Paste Special dialog:<br /><br /> <!-- :::::::::: FIGURE 7 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhpastedialog.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhpastedialog.png" title="JH paste dialog" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 7: JH paste dialog</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 7 :::::::::: --> <!-- :::::::::: FIGURE 8 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhpanelwf.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhpanelwf.png" title="JH panel workfile" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 8: JH panel workfile</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 8 :::::::::: --> With the data in a standard panel workfile, all of the standard EViews tools are now available. We can view a graph of the cases by country by opening the cases series, clicking on <b>View->Graph</b>, and then selecting <b>Individual cross sections</b> as the <b>Panel option</b>.<br /><br /> <!-- :::::::::: FIGURE 9 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhallcxgraph.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhallcxgraph.png" title="JH graph of all cross-sections" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 9: JH graph of all cross-sections</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 9 :::::::::: --> This graph may be a little unwieldy, so we can reduce the number of cross-sections down to, say, only countries that have, thus far, experienced more than 10,000 cases by using the smpl command:<br /><br /> <pre style="overflow:auto"><br /> smpl if @maxsby(cases, country_region)>10000<br /> </pre> <!-- :::::::::: FIGURE 9 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/jhmaxsbygraph.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/jhmaxsbygraph.png" title="JH cross-sections with more than 10000 cases" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 9: JH cross-sections with more than 10000 cases</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 9 :::::::::: --> Of course, all of this could have been done in an EViews program, and it could be automated to combine all three data files, ending up with a panel containing cases, deaths and recoveries. The following EViews code produces such a panel:<br /><br /> <pre style="overflow:auto"><br /> <font color="green">'close all existing workfiles</font><br /> close @wf<br /> <br /> <font color="green">'names of the three topics/files</font><br /> %topics = "confirmed deaths recovered"<br /> <br /> <font color="green">'loop through the topics</font><br /> for %topic {%topics}<br /> <font color="green">'build the url by taking the base url and then adding the topic in the middle</font><br /> %url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_" + %topic + "_global.csv"<br /> <br /> <font color="green">'load up the url as a new page</font><br /> pageload(page=temp) {%url}<br /> <br /> <font color="green">'stack the page into a 3d panel</font><br /> pagestack(page=stack_{%topic}) _? @ *? *<br /> <br /> <font color="green">'do some renaming and make the date series</font><br /> rename country_region country <br /> rename province_state province<br /> rename _ {%topic}<br /><br /> series date = @dateval(var01, "MM_DD_YYYY")<br /><br /> <font color="green">'structure the page</font><br /> pagestruct province country @date(date)<br /> <br /> <font color="green">'delete the original page</font><br /> pagedelete temp<br /><br /> <font color="green">'create the 2D panel page</font><br /> pagecreate(id, page=panel) country @date @srcpage stack_{%topic}<br /> next<br /> <br /> <font color="green">'loop through the topics copying each from the 3D panel into the 2D panel</font><br /> for %topic {%topics}<br /> copy(c=sum) stack_{%topic}\{%topic} * @src @date country @dest @date country<br /> pagedelete stack_{%topic}<br /> next<br /> </pre> <h3 id="sec2">European Centre for Disease Prevention and Control Data</h3> The second repository we'll use is data provided by the <a href="https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide">ECDC's Covid-19 Data site</a>. They provide an extremely easy to use data for each country, along with population data. Importing these data into EViews is trivial – you can open the XLSX file directly using the <b>File->Open-Foreign Data as Workfile</b> dialog and entering the URL to the XLSX in the <b>File name</b> box:<br /><br /> <!-- :::::::::: FIGURE 10 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/ecdcopenpath.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/ecdcopenpath.png" title="ECDC open path" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 10: ECDC open path</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 10 :::::::::: --> The resulting workfile will look like this:<br /><br /> <!-- :::::::::: FIGURE 11 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/ecdcwf.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/ecdcwf.png" title="ECDC workfile" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 11: ECDC workfile</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 11 :::::::::: --> All we need to do is structure it as a panel, which we can do by clicking on <b>Proc->Structure/Resize Current Page</b> and then entering the cross-section and date identifiers (we also choose to keep an unbalanced panel by unchecking the <b>Balance between starts & ends</b> box).<br /><br /> <!-- :::::::::: FIGURE 12 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/ecdcstructuredialog.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/ecdcstructuredialog.png" title="ECDC structure WF dialog" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 12: ECDC strcture WF dialog</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 12 :::::::::: --> The result is an EViews panel workfile:<br /><br /> <!-- :::::::::: FIGURE 13 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/ecdcseries.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/ecdcseries.png" title="ECDC series" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 13: ECDC series</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 13 :::::::::: --> The data provided by ECDC contains the number of new cases and deaths each day. Most presentation of Covid-19 data has been with the total number of cases and deaths per country. We can create the totals with the <b>@cumsum</b> function which will produce the cumulative sum, resetting to zero as the start of each cross-section.<br /><br /> <pre style="overflow:auto"><br /> series ccases = @cumsum(cases)<br /> series cdeaths = @cumsum(deaths)<br /> </pre> With this panel we can perform standard panel data analysis, or produce graphs (see the Johns Hopkins examples above). However, since the ECDC have included standard <a href="https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes"></a>ISO country codes for the countries, we can also tie the data to a geomap.<br /><br /> We found a simple <a href="http://thematicmapping.org/downloads/world_borders.php">shapefile</a> of the world <a href="http://thematicmapping.org/downloads/world_borders.php">online</a>, and downloaded it to our computer. In EViews we then click on <b>Object->New Object->GeoMap</b> to create a new geomap, and then drag the <b>.prj</b> file we downloaded onto the geomap.<br /><br /> In the properties box that appears, we tie the countries defined in the shapefile to the identifiers in the workfile. Since the shapefile uses ISO codes, and we have those in the <b>countriesandterritories</b> series, we can use those to map the workfile to the shapefile:<br /><br /> <!-- :::::::::: FIGURE 14 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/geomapprops.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/geomapprops.png" title="Geomap properties" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 14: Geomap properties</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 14 :::::::::: --> Which results in the following global geomap:<br /><br /> <!-- :::::::::: FIGURE 15 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/geopmapglobal.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/geomapglobal.png" title="Global geomap" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 15: Global geomap</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 15 :::::::::: --> We can use the <b>Label:</b> dropdown to remove the country labels to give a clearer view of the map (note this feature is a recent addition, you may need to update your copy of EViews to see the <b>None</b> option).<br /><br /> To add some color information to the map we click on <b>Properties</b> and then the <b>Color</b> tab. We'll add two custom color settings – a gradient fill so show differences in the number of cases, and a single solid color for countries with a large number of cases:<br /><br /> <!-- :::::::::: FIGURES 16a and 16b :::::::::: --> <center> <table> <tr> <td> <!-- :::::::::: FIGURE 16a :::::::::: --> <center> <a href="http://www.eviews.com/blog/covid19/images/ecdcgeomaprange.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/ecdcgeomaprange.png" title="ECDC geomap color range" width="360" /></a><br /> </center> </td> <td> <!-- :::::::::: FIGURE 16b :::::::::: --> <center> <a href="http://www.eviews.com/blog/covid19/images/ecdcgeomapthresh.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/ecdcgeomapthresh.png" title="ECDC geomap color threshold" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 3a: ECDC geomap color range</small> </center> </td> <td class="nb"> <center> <small>Figure 3b: ECDC geomap color threshold</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURES 16a and 16b :::::::::: --> And then entering <b>ccases</b> as the coloring series. This results in a map:<br /><br /> <!-- :::::::::: FIGURE 17 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/ecdcgeomap.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/ecdcgeomap.png" title="ECDC geomap" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 17: ECDC geomap</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 17 :::::::::: --> Again, this could all be done programmatically with the following program (note the ranges for coloring will need to be changed as the virus becomes more wide spread):<br /><br /> <pre style="overflow:auto"><br /> <font color="green">'download data</font><br /> wfopen https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx<br /> rename countryterritorycode iso3<br /> pagecontract if iso3<>""<br /> pagestruct(bal=m) iso3 @date(daterep)<br /> <br /> <font color="green">'make cumulative data</font><br /> series ccases = @cumsum(cases)<br /> series cdeaths = @cumsum(deaths)<br /> <br /> <font color="green">'make geomap for cases</font><br /> geomap cases_map<br /> cases_map.load ".\World Map\TM_WORLD_BORDERS_SIMPL-0.3.prj"<br /> cases_map.link iso3 iso3<br /> cases_map.options -legend<br /> cases_map.setlabel none<br /> cases_map.setfillcolor(t=custom) mapser(ccases) naclr(@RGB(255,255,255)) range(lim(0,12000,cboth), rangeclr(@grad(@RGB(255,255,255),@RGB(0,0,255))), outclr(@trans,@trans), name("Range")) thresh(12000, below(@trans), above(@RGB(0,0,255)), name("Threshold"))<br /> <br /> <font color="green">'make geomaps for deaths</font><br /> geomap deaths_map<br /> deaths_map.load ".\World Map\TM_WORLD_BORDERS_SIMPL-0.3.prj"<br /> deaths_map.link iso3 iso3<br /> deaths_map.options -legend<br /> deaths_map.setlabel none<br /> deaths_map.setfillcolor(t=custom) mapser(cdeaths) naclr(@RGB(255,255,255)) range(lim(1,500,cboth), rangeclr(@grad(@RGB(255,128,128),@RGB(128,64,64))), outclr(@trans,@trans), name("Range")) thresh(500,cleft,below(@trans),above(@RGB(128,0,0)),name("Threshold")) <br /> </pre> <h3 id="sec3">New York Times US County Data</h3> The final data repository we will look at is the <a href="https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv">New York Times</a> data for the United States at county level. These data are also trivial to import into EViews, you can again just enter the URL for the CSV file to open it. Rather than walking through the UI steps, we'll simply post the two lines of code required to import and structure as a panel:<br /><br /> <pre style="overflow:auto"><br /> <font color="green">'retrieve data from NY Times github</font><br /> wfopen(page=covid) https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv<br /> <br /> <font color="green">'structure as a panel based on date and FIPS ID</font><br /> pagestruct(dropna) fips @date(date)<br /> </pre> Note that the New York Times have conveniently provided the <a href="https://en.wikipedia.org/wiki/FIPS_county_code">FIPS code</a> for each county, which means we can also produce some geomaps. We've downloaded a US county map from the <a href="https://dataverse.tdl.org/dataset.xhtml?persistentId=doi:10.18738/T8/CPTP8C">Texas Data Repository</a>, and then linked the <b>FIPS</b> series in the workfile with the <b>FIPS_BEA</b> attribute of the map:<br /><br /> <!-- :::::::::: FIGURE 17 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/images/geomapfipsprops.png"><img height="auto" src="http://www.eviews.com/blog/covid19/images/geomapfipsprops.png" title="Geomap FIPS properties" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 17: Geomap FIPS properties</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 17 :::::::::: --> The full code to produce such a map is:<br /><br /> <pre style="overflow:auto"><br /> <font color="green">'retrieve data from NY Times github</font><br /> wfopen(page=covid) https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv<br /> <br /> <font color="green">'structure as a panel based on date and FIPS ID</font><br /> pagestruct(dropna) fips @date(date)<br /> <br /> <font color="green">'set displaynames for use in geomaps</font><br /> cases.displayname Confirmed Cases<br /> deaths.displayname Deaths<br /> <br /> <font color="green">'make geomap</font><br /> geomap cases_map<br /> cases_map.load ".\Us County Map\CountiesBEA.prj"<br /> cases_map.link fips_bea fips<br /> cases_map.options -legend<br /> cases_map.setlabel none<br /> cases_map.setfillcolor(t=custom) mapser(cases) naclr(@RGB(255,255,255)) range(lim(1,200,cboth), rangeclr(@grad(@RGB(204,204,255),@RGB(0,0,255))), outclr(@trans,@trans), name("Range")) thresh(200, below(@trans), above(@RGB(0,0,255)), name("Threshold")) <br /> </pre> <h3 id="sec4">Sneak Peaks</h3> One of the features our engineering team have been working on for the next major release of EViews is the ability to produce animated graphs and geomaps (the keen eyed amongst you may have noticed the <b>Animate</b> button on a few of our screenshots). Whilst this feature is a little far away from release, the Covid-19 data does give an interesting set of testing procedures, and we thought we'd share some of the results.<br /><br /> <!-- :::::::::: ANIMATION 1 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/animations/cases_map.gif"><img height="auto" src="http://www.eviews.com/blog/covid19/animations/cases_map.gif" title="US counties cases evolution (wait for it...)" width="680" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Animation 1: US counties cases evolution</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: ANIMATION 1 :::::::::: --> <!-- :::::::::: ANIMATION 2 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/covid19/animations/cases_map.gif"> <video width="680" controls> <source src= "http://www.eviews.com/blog/covid19/animations/graph01.mp4" type="video/mp4" title="Confirmed cases"> </video> </a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Animation 2: Confirmed cases</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: ANIMATION 1 :::::::::: --></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com9tag:blogger.com,1999:blog-6883247404678549489.post-86065470278518103992020-02-25T07:58:00.001-08:002020-03-04T09:27:00.165-08:00Beveridge-Nelson Filter<style> table { border: 0px solid black; border-collapse: separate; border-spacing: 10px; } td { border: 1px solid black; } .nb { border: 0px solid black; } .step { counter-reset: section; list-style-type: none; } .step li::before { counter-increment: section; content: "Step "counter(section) ": "; } </style> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"> </script> <span style="font-family: &quot;verdana&quot; sans-serif"> <i>Authors and guest post by Benjamin Wong (Monash University) and Davaajargal Luvsannyam (The Bank of Mongolia)</i><br /><br /> Analysis of macroeconomic time series often involves decomposing a series into a trend and cycle components. In this blog post, we describe the Kamber, Morley, and Wong (2018) Beveridge-Nelson (BN) filter and the associated EViews add-in. <a name='more'></a><br /><br /> <h3>Table of Contents</h3> <ol> <li><a href="#sec1">Introduction</a> <li><a href="#sec2">The BN Decomposition</a> <li><a href="#sec3">The BN Filter</a> <li><a href="#sec4">Why Use the BN Filter</a> <li><a href="#sec5">BN Filter Implementation</a> <li><a href="#sec6">Conclusion</a> <li><a href="#sec7">Files</a> <li><a href="#sec8">References</a> </ol><br /> <h3 id="sec1">Introduction</h3> In this blog entry, we will discuss the Beveridge-Nelson (BN) filter - the Kamber, Morley, and Wong (2018) modification of the well-known Beveridge and Nelson (1981) decomposition. In particular, we will discuss the application of both procedures to estimating the <i>output gap</i>, which the US Bureau of Economic Analysis (BEA) and the Congressional Budget Office (CBO) define as the proportional deviation of the real actual <i>gross domestic product</i> (GDP) from the real potential GDP.<br /><br /> The analysis to follow will use quarterly data from the post World War II period 1947Q1 to 2019Q3 and will be downloaded from the FRED database. In this regard, we begin by creating a new quarterly workfile as follows: <ol> <li>From the main EViews window, click on <b>File/New/Workfile...</b>. <li>Under <b>Frequency</b> select <b>Quarterly</b>. <li>Set the <b>Start date</b> to <i>1947Q1</i> and the set the <b>End date</b> to <i>2019Q3</i> <li>Hit <b>OK</b>. </ol> Next, we fetch the GDP data as follows: <ol> <li>From the main EViews window, click on <b>File/Open/Database...</b>. <li>From the <b>Database/File Type</b> dropdown, select <b>FRED Database</b>. <li>Hit <b>OK</b>. <li>From the FRED database window, click on the <b>Browse</b> button. <li>Next, click on <b>All Series Search</b> and in the <b>Search for</b> box,type <i>GDPC1</i>. (This is the real actual seasonally adjusted GDP) <li>Drag the series over to the workfile to make it available for analysis. <li>Again, in the <b>Search for</b> box, type <i>GDPPOT</i>. (This is the real potential seasonally unadjusted GDP estimated by the CBO) <li>Drag the series over to the workfile to make it available for analysis. <li>Close the FRED windows as they are no longer needed. </ol> <!-- :::::::::: FIGURES 1a and 1b :::::::::: --> <center> <table> <tr> <td> <!-- :::::::::: FIGURE 1a :::::::::: --> <center> <a href="http://www.eviews.com/blog/bnfilter/fredbrowse.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/fredbrowse.png" title="FRED Browse" width="180" /></a><br /> </center> </td> <td> <!-- :::::::::: FIGURE 1b :::::::::: --> <center> <a href="http://www.eviews.com/blog/bnfilter/fredsearch.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/fredsearch.png" title="FRED Search" width="180" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 1a: FRED Browse </small> </center> </td> <td class="nb"> <center> <small>Figure 1b: FRED Search</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURES 2a and 2b :::::::::: --> Next, rename the series <b>GDPC1</b> to <b>GDP</b> by issuing the following command: <pre><br /> rename gdpc1 gdp<br /> </pre> We now show how to obtain the implied estimate of the output gap from the CBO to provide the user some perspective on how to obtain the output gap. In particular, the CBO implied estimate of the output gap is defined using the formula: $$CBOGAP = 100\left(\frac{GDP - GDPPOT}{GDPPOT}\right)$$ For reference, we will create this series in EViews and call it <b>CBOGAP</b>. This is done by issuing the following command: <pre><br /> series cbogap = 100*(gdp-gdppot)/gdppot<br /> </pre> We also plot <b>CBOGAP</b> below: <!-- :::::::::: FIGURE 2 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/gap.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/gap.png" title=" CBO implied estimate of the output gap" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 2: CBO implied estimate of the output gap</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 2 :::::::::: --> <h3 id="sec2">BN Decomposition</h3> Recall here that for any time series $y_{t}$, the BN decomposition determines a trend process $\tau_{t}$ and a cycle process $c_{t}$, such that $y_{t} = \tau_{t} + c_{t}$. In this regard, the trend component $\tau_{t}$ is the deviation of the long-horizon conditional forecast of $y_{t}$ from its deterministic drift $\mu$. In other words: $$\tau_{t} = \lim_{h\rightarrow \infty} E_{t}\left(y_{t+h} - h\mu\right) \quad \text{where} \quad \mu = E(\Delta y_{t})$$ On the other hand, the cyclical component is the deviation of the underlying process from its long-horizon forecast. Intuitively, when $y_{t}$ represents the GDP of some economy, the cycle process $c_{t} = y_{t} - \tau_{t}$ is interpreted as the <i>output gap</i>.<br /><br /> In practice, in order to capture the autocovariance structure of $\Delta y_{t}$, the BN decomposition starts by first fitting an autoregressive moving-average (ARMA) model to $\Delta(y)$ and then proceeds to derive $\tau_{t}$ and $c_{t}$. For instance, when the model of choice is AR(1), the BN decomposition derives from the following steps:<br /><br /> <ol class="step"> <li>Fit an AR(1) model to $\Delta y_{t}$: $$\Delta y_{t} = \widehat{\alpha} + \widehat{\phi}\Delta y_{t-1} + \widehat{\epsilon}_{t}$$ <li> Estimate the deterministic drift as the unconditional mean process: $$\widehat{\mu} = \frac{\widehat{\alpha}}{1 - \widehat{\phi}}$$ <li> Estimate the BN trend process: $$\widehat{\tau}_{t} = \left(y_{t} + \left(\frac{\widehat{\phi}}{1 - \widehat{\phi}}\right) \Delta y_{t}\right) - \left(\frac{\widehat{\phi}}{1 - \widehat{\phi}}\right) \widehat{\mu}$$ <li> Estimate the BN cycle component: $$\widehat{c}_{t} = y_{t} - \widehat{\tau}_{t}$$ </ol><br /> As an illustrative example, consider the BN decomposition of US quarterly real GDP. To conform with the Kamber, Morley, and Wong (2018) paper, we will also transform the raw US real GDP as 100 times its logarithm. In this regard, we generate a new EViews series object <b>LOGGDP</b> by issuing the following command: <pre><br /> series loggdp = 100 * log(gdp)<br /> </pre> At last, following the 4 steps outlined earlier, we derive the BN decomposition in EViews as follows: <pre><br /> series dy = d(loggdp)<br /> equation ar1.ls dy c dy(-1) 'Step 1<br /> scalar mu = c(1)/(1-c(2)) 'Step 2<br /> series bntrend = loggdp + (dy - mu)*c(2)/(1 - c(2)) 'Step 3<br /> series bncycle = loggdp - bntrend 'Step 4<br /> </pre> The BN trend and cycle series are displayed in Figure 2 below.<br /><br /> <!-- :::::::::: FIGURES 3a and 3b :::::::::: --> <center> <table> <tr> <td> <!-- :::::::::: FIGURE 3a :::::::::: --> <center> <a href="http://www.eviews.com/blog/bnfilter/bntrend.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bntrend.png" title="BN Trend" width="360" /></a><br /> </center> </td> <td> <!-- :::::::::: FIGURE 3b :::::::::: --> <center> <a href="http://www.eviews.com/blog/bnfilter/bncycle.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bncycle.png" title="BN Cycle" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 3a: BN Trend</small> </center> </td> <td class="nb"> <center> <small>Figure 3b: BN Cycle</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURES 3a and 3b :::::::::: --> To see how the BN decomposition estimate of the output gap compares to the CBO implied estimate of the output gap, we plot both series on the same graph.<br /><br /> <!-- :::::::::: FIGURE 4 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/bncvsgap.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bncvsgap.png" title="BN Cycle vs CBO implied output gap estimate" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 4: BN Cycle vs CBO implied output gap estimate</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 4 :::::::::: --> Evidently, the BN cycle series lacks persistence (very noisy), lacks amplitude (low variance), and in general, does not exhibit the characteristics found in the CBO implied estimate of the output gap, <b>CBOGAP</b>.<br /><br /> <h3 id="sec3">The BN Filter</h3> First, to explain why the BN estimate of output gap lacks the persistence of its true counterpart, recall the formula for the BN cycle component for an AR(1) model: $$c_{t} = y_{t} - \tau_{t} = -\frac{\phi}{1-\phi}(\Delta y_{t} - \mu)$$ Clearly, when $\phi$ is small, $\Delta y_{t}$ is not very persistent. Since $c_{t}$ is only as persistent as $\Delta y_{t}$, the cycle component itself lacks the persistence one expects of the true output gap series.<br /><br /> Next, to explain why $c_{t}$ lacks the expected amplitude, define the signal-to-noise ratio $\delta$ for any time series as the ratio of the variance of trend shocks relative to the overall forecast error variance. In other words: $$\delta \equiv \frac{\sigma^{2}_{\Delta \tau}}{\sigma^{2}_{\epsilon}} = \psi(1)^{2}$$ which follows since $\Delta\tau = \psi(1)\epsilon_{t}$ and $\psi(1) = \lim_{h\rightarrow \infty} \frac{\partial y_{t+h}}{\partial \epsilon_{t}}$. Intuitively, $\psi(1)$ is the <i>long-run multiplier</i> that captures the permanent effect of the forecast error on the long-horizon conditional expectation of $y_{t}$. Quite generally, as demonstrated in Kamber, Morley, and Wong (2018), for any AR(p) model: \begin{align} \Delta y_{t} = c + \sum_{k=1}^{p}\phi_{k}\Delta y_{t-k} + \epsilon_{t} \label{eq1} \end{align} the signal-to-noise ratio is given by the relation \begin{align} \delta = \frac{1}{(1-\phi(1))^{2}} \quad \text{where} \quad \phi(1) = \phi_{1} + \ldots + \phi_{p}\label{eq2} \end{align} In particular, when the forecasting model is AR(1), as was the case in the BN decomposition above, the signal-to-noise ratio is simply $\delta = \frac{1}{(1-\phi)^{2}}$ and in the case of the US GDP growth process, it is $\delta = \frac{1}{(1-0.36)^{2}} = 2.44$. In other words, the BN trend shocks exhibit higher volatility than quarter-to-quarter forecast errors and the signal-to-noise ratio is therefore relatively high. In fact, in the case of a freely estimated AR$(p)$ model of output growth, $\phi(1) < 1$, which implies that $\delta > 1$. In other words,the trend will be more volatile than the cycle, and at odds if one expects the cycle shocks (the output gap amplitude) to explain the majority of the systematic forecast variance.<br /><br /> To correct for the aforementioned shortcomings of the BN decomposition, Kamber, Morley, and Wong (2018) exploit the relationship between the signal-to-noise ratio and the AR coefficients in equation \eqref{eq2}. In particular, they note that equation \eqref{eq2} implies that: \begin{align} \phi(1) = 1 - \frac{1}{\sqrt{\delta}} \end{align} In this regard, the idea underlying the BN filter is to fix a specific value to the signal-to-noise ratio, say $\delta = \bar{\delta}$. Subsequently, the BN decomposition is derived from an AR model, the AR coefficients of which are forced to sum to $\bar{\phi}(1) \equiv 1 - \frac{1}{\sqrt{\bar{\delta}}}$. In other words, the BN decomposition is derived while imposing a particular signal-to-noise ratio.<br /><br /> It is important to note here that estimation of the BN decomposition under a particular signal-to-noise ratio restriction is in fact straightforward and does not require complicated non-linear routines. To see this, observe that equation \eqref{eq1} can be rewritten as: \begin{align} \Delta y_{t} = c + \rho \Delta y_{t-1} + \sum_{k=1}^{p-1}\phi^{\star}_{k}\Delta^{2} y_{t-k} + \epsilon_{t} \label{eq3} \end{align} where $\rho = \phi_{1} + \ldots + \phi_{p}$ and $\phi^{\star}_{k} = -\left(\phi_{k+1} + \ldots + \phi_{p}\right)$. Then, imposing the restriction $\rho = \bar{\rho} \equiv \bar{\phi}(1)$ reduces the regresion in \eqref{eq3} to: \begin{align} \Delta y_{t} - \bar{\rho} \Delta y_{t-1} = c + \sum_{k=1}^{p-1}\phi^{\star}_{k}\Delta^{2} y_{t-k} + \epsilon_{t} \label{eq4} \end{align} In other words, $\bar{\rho}\Delta y_{t-1}$ is brought to the left hand side and the regressand in the regression \eqref{eq4} becomes $\Delta \bar{y}_{t} \equiv \Delta y_{t} - \bar{\rho} \Delta y_{t-1}$.<br /><br /> <h3 id="sec4">Why Use the BN Filter?</h3> Before we demonstrate the BN Filter add-in, we quickly outline two reasons why the BN filter might be a reasonable approach, particularly when estimating the output gap. <ol> <li>When analyzing GDP growth, standard ARMA model selection often favours low order AR variants, which, as discussed earlier, produce high signal-to-noise ratios. <li>Unlike alternative low signal-to-noise ratio procedures such as deterministic quadratic detrending, the Hodrick-Prescott (HP) filter, and the bandpass (BP) filter, which often require large number of estimation revisions (as new data comes in) and are typically unreliable in out-of-sample forecasts (see Orphanides and Van Norden, (2003)), Kamber, Morley and Wong (2018) argue that the BN filter exhibits better out-of-sample performance and generally requires fewer estimation revisions to match observable data characteristics.<br /><br /> </ol> To further drive this latter point, we demonstrate the impact of ex-post estimation of the output gap using the HP filter. In particular, we will first estimate the output gap (the cycle component) of the <b>LOGGDP</b> series for the period 1947Q1 to 2008Q3 and call it <b>HPCYCLE</b>, and then again for the period 1947Q1 to 2019Q3 and call it <b>HPCYLCE_EXPOST</b>.<br /><br /> To estimate the HP filter cycle component for the period 1947Q1 to 2008Q3, we first set the sample accordingly by issuing the command: <pre><br /> smpl @first 2008Q3<br /> </pre> Next, we estimate the HP filter cycle series as follows: <ol> <li>From the workfile, double click on the series <b>LOGGDP</b> to open the series. <li>In the series window, click on <b>Proc/Hodrick-Prescott Filter...</b> <li>In the <b>Cylce series</b> text box, type <i>hpcycle</i>. <li>Hit <b>OK</b>. </ol> <!-- :::::::::: FIGURE 5 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/hpfilter.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/hpfilter.png" title="HP Filter" width="180" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 5: HP Filter</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 5 :::::::::: --> The steps are repeated for the sample period 1947Q1 to 2019Q3. A plot of both cycle series on the same graph is presented below.<br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/hpcycleexpost.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/hpcycleexpost.png" title="HP Cycle vs HP Cycle Ex Post" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 6: HP Cycle vs HP Cycle Ex Post</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 6 :::::::::: --> Evidently, the ex-post HP filter estimation of the output gap diverges from its shorter period counterpart starting from 2006Q1. It is precisely this drawback that we will see is not nearly as pronounced in BN filter estimates.<br /><br /> <h3 id="sec5">BN Filter Implementation</h3> To implement the BN Filter, we need to download and install the add-in from the EViews website. The latter can be found at <a href="https://www.eviews.com/Addins/BNFilter.aipz">https://www.eviews.com/Addins/BNFilter.aipz</a>. We can also do this from inside EViews itself: <ol> <li>From the main EViews window, click on <b>Add-ins/Download Add-ins...</b> <li>Click on the the BNFilter add-in. <li>Click on <b>Install</b>. </ol> <!-- :::::::::: FIGURE 5 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/addin.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/addin.png" title="Install Add-in" width="180" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 5: Install Add-in</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 5 :::::::::: --> At last, we will demonstrate how to apply the BN Filter add-in using an AR(12) model. To do so, proceed as follows: <ol> <li>From the workfile window, double click on <b>LOGGDP</b> to open the spreadsheet view of the series. <li>To access the BN filter dialog, click on <b>Proc/Add-ins/BN Filter</b> <li>Stick with the defaults and hit <b>OK</b>. </ol><br /> <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfilter.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfilter.png" title="BN Filter Dialog" width="180" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 6: BN Filter Dialog</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 6 :::::::::: --> The signal-to-noise ratio, while not specified above, is chosen using the Kamber, Morley, and Wong (2018) automatic selection procedure which balances the trade off between fit and amplitude. Typically, the signal-to-noise ratio for the US using such a procedure is about 0.25, which implies a quarter of the shocks to US GDP are permanent. Below, we show the BN Filter cycle series both alone and in comparison to the CBO implied estimate of the output gap <b>CBOGAP</b>.<br /><br /> <!-- :::::::::: FIGURES 7a and 7b :::::::::: --> <center> <table> <tr> <td> <!-- :::::::::: FIGURE 7a :::::::::: --> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfcycle.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfcycle.png" title="BN Filter Cycle" width="360" /></a><br /> </center> </td> <td> <!-- :::::::::: FIGURE 7b :::::::::: --> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfcvsgap.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfcvsgap.png" title="BN Filter Cycle vs. CBO implied output gap estimate" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 7a: BN Filter Cycle</small> </center> </td> <td class="nb"> <center> <small>Figure 7b: BN Filter Cycle vs CBO implied output gap estimate</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURES 7a and 7b :::::::::: --> We also plot a comparison of the BN Filter cycle series with the HP filtered cycle.<br /><br /> <!-- :::::::::: FIGURE 8 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfcvshpc.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfcvshpc.png" title="BN Filter Cycle vs HP Filter Cycle" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 8: BN Filter Cycle vs HP Filter Cycle</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 8 :::::::::: --> As we can see, the BN filter estimate of the the US output gap using an AR(12) model resembles what we would get for an output gap that has a low signal-to-noise ratio. The amplitude is reasonably large, we see business cycles, and the troughs line up with the recessions dated by the NBER. The amplitude of the output gap estimated using the BN Filter is comparable to that of the cycle obtained by the HP filter, as well as the implied estimated of the CBO, which is unlike what we see in Figure 4.<br /><br /> The BN filter add-in also accommodates the ability to incorporate knowledge about structural breaks. In particular, we will use 2006Q1 as a structural break which is consistent with the date found by a Bai and Perron (2003) test, used by Kamber, Morley and Wong (2018), and is consistent with independent work by Eo and Morley (2019). The following steps demonstrate the outcome: <ol> <li>From the workfile window, double click on <b>LOGGDP</b> to open the spreadsheet view of the series. <li>To access the BN filter dialog, click on <b>Proc/Add-ins/BN Filter</b> <li>Select the <b>Structural Break</b> box. <li>In the <b>Date of structural break</b> text box, enter <i>2006Q1</i>. <li>Hit <b>OK</b>. </ol><br /> <!-- :::::::::: FIGURE 9 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfcyclesb.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfcyclesb.png" title="BN Filter Cycle (Structural Break)" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 9: BN Filter Cycle (Structural Break)</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 9 :::::::::: --> Now we see a more positive output gap post-2006 as the structural break accounts for the fact that the average GDP growth rate has fallen.<br /><br /> Suppose however that we were ignorant about the actual date of the break. This might be the case in practice as it could take a decade or more before one could empirically identify a structural break date. In this case, a possible option is to use a rolling window for the average growth rate. In this example, we use a backward window of 40 quarters as the average growth rate. The idea is that if there were breaks, they would be reflected in this window. When this is the case, we proceed as follows: <ol> <li>From the workfile window, double click on <b>LOGGDP</b> to open the spreadsheet view of the series. <li>To access the BN filter dialog, click on <b>Proc/Add-ins/BN Filter</b> <li>Select the <b>Dynamic mean adjustment</b> box. <li>Hit <b>OK</b>. </ol><br /> <!-- :::::::::: FIGURES 10a and 10b :::::::::: --> <center> <table> <tr> <td> <!-- :::::::::: FIGURE 10a :::::::::: --> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfcycledma.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfcycledma.png" title="BN Filter Cycle (Dynamic Mean Adjustment)" width="360" /></a><br /> </center> </td> <td> <!-- :::::::::: FIGURE 9b :::::::::: --> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfcsbvsedma.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfcsbvsedma.png" title="BN Filter Cycle (Known vs Unknown Structural Break)" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 10a: BN Filter Cycle (Dynamic Mean Adjustment)</small> </center> </td> <td class="nb"> <center> <small>Figure 10b: BN Filter Cycle (Known vs Unknown Structural Break)</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURES 10a and 10b :::::::::: --> Evidently, the estimated output gap looks similar to the one estimated with an explicit structural break in 2006Q1. In general, this suggests that using a backward window to adjust for the mean growth rate might be a useful real-time strategy for dealing with breaks.<br /><br /> Users are not constrained to the automatic option, which balances the trade off between fit and amplitude. The BN filter add-in also allows users to specify a desired signal-to-noise ratio. For instance, the following example compares the difference in setting the signal-to-noise ratio $\delta$, to 0.05 (which implies 5% of the variance is permanent), against the default 0.25 which we derived earlier by leaving $\delta$ unspecified, and so uses the procedure which balances the trade off between fit and amplitude. To do so, we proceed as follows: <ol> <li>From the workfile window, double click on <b>LOGGDP</b> to open the spreadsheet view of the series. <li>To access the BN filter dialog, click on <b>Proc/Add-ins/BN Filter</b> <li>Select the <b>Structural Break</b> box. <li>In the <b>Date of structural break</b> text box, enter <i>2006Q1</i>. <li>Hit <b>OK</b>. </ol><br /> The plot below summarizes the exercise.<br /><br /> <!-- :::::::::: FIGURE 11 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfc25vs5.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfc25vs5.png" title="BN Filter Cycle (delta = 0.25 vs delta = 0.05)" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 11: BN Filter Cycle (delta = 0.25 vs delta = 0.05</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 11 :::::::::: --> Unsurprisingly, specifying $\delta = 0.05$ results in an output gap with a larger amplitude than the default as the new specification implies a smaller proportion of the shocks to the forecast error is parsed to the trend, and so a larger proportion is parsed to the cycle, leading to a larger amplitude cycle.<br /><br /> Finally, we come back to the issue of revision. As we mentioned earlier, the BN filter should produce output gaps that are less revised as long as the AR forecasting model is stable, especially when compared to the heavily revised HP Filter. Here, we show the output gap estimated using the BN filter with data up to 2008Q3, and one ex-post up to 2019Q3. Clearly, the output gap is hardly revised, which address a key critique of Orphanides and Van Norden (2003).<br /><br /> <!-- :::::::::: FIGURE 11 :::::::::: --> <center> <table> <tr> <td> <center> <a href="http://www.eviews.com/blog/bnfilter/bnfcexpost.png"><img height="auto" src="http://www.eviews.com/blog/bnfilter/bnfcexpost.png" title="BN Filter Cycle (Ex-Post)" width="360" /></a><br /> </center> </td> </tr> <tr> <td class="nb"> <center> <small>Figure 11: BN Filter Cycle (Ex-Post)</small> </center> </td> </tr> </table> <br /> </center> <!-- :::::::::: FIGURE 11 :::::::::: --> <h3 id="sec6">Conclusion</h3> In this blog post we have outlined the BN filter add-in associated with the work of Kamber, Morley and Wong (2018). In general, we hope the ease of using the add-in, together with some of the useful properties of the BN Filter will encourage practitioners to explore using the procedure in their work.<br /><br /> <h3 id="sec7">Files</h3> <ul> <li><a href="http://www.eviews.com/blog/bnfilter/bnfilter_blog.prg">bnfilter_blog.prg</a> </ul> <br /><br /> <hr /> <h3 id="sec8">References</h3> <ol class="bib2xhtml"> <li><a name="bai-2003"></a>Bai J. and Perron P.: Computation and analysis of multiple structural change models <cite>Journal of Applied Econometrics</cite>, 18(1) 1&#x2013;22, 2003. </li> <li><a name="beveridge-1981"></a>Beveridge S. and Nelson C. R.: A new approach to decomposition of economic time series into permanent and transitory components with particular attention to measurement of the business cycle <cite>Journal of Monetary Economics</cite>, 7(2) 151&#x2013;174, 1981. </li> <li><a name="eo-2019"></a>Eo Y. and Morley J.: Why has the US economy stagnated since the Great Recession <cite>University of Sydeny Working Papers 2017-14</cite>, 2019. </li> <li><a name="kamber-2018"></a>Kamber G., Morley J., and Wong B.: Intuitive and reliable estimates of the output gap from a Beveridge-Nelson filter <cite>The Review of Economics and Statistics</cite>, 100(3) 550&#x2013;566, 2018. </li> <li><a name="orphanides-2002"></a>Orphanides A and Van Norden S.: The unreliability of output-gap estimates in real time <cite>The Review of Economics and Statistics</cite>, 84(4) 569&#x2013;583, 2002. </li> <li><a name="watson-1986"></a>Watson M.: Univariate detrending methods with stochastic trends <cite>Journal of Monetary Economics</cite>, 18(1) 49&#x2013;75, 1986. </li> </ol></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com20tag:blogger.com,1999:blog-6883247404678549489.post-68080614447683089002019-12-04T09:39:00.000-08:002019-12-04T09:39:38.953-08:00Sign and Zero Restricted VAR Add-In<style> table, th, td { border: 1px solid black; border-collapse: collapse; } th { padding: 5px; text-align: middle; } td { padding: 5px; text-align: left; } </style> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; sans-serif"><i>Authors and guest post by Davaajargal Luvsannyam and Ulziikhutag Munkhtsetseg</i><br /><br /> In our previous <a href="http://blog.eviews.com/2019/10/sign-restricted-var-add-in.html">blog entry</a>, we discussed the sign restricted VAR (SRVAR) add-in for EViews. Here, we will discuss imposing a further zero restrictions on the impact period of the impulse response function (IRF) using the ARW and SRVAR add-ins in tandem.<a name='more'></a><br /><br /> <h3>Table of Contents</h3><ol> <li><a href="#sec1">Introduction</a> <li><a href="#sec2">Orthogonal Reduced-Form Parameterization</a> <li><a href="#sec3">ARW Algorithms</a> <li><a href="#sec4">ARW EViews Add-in</a> <li><a href="#sec5">Conclusion</a> <li><a href="#sec6">References</a></ol><br /> <h3 id="sec1">Introduction</h3> Note that it is certainly possible to impose both sign and exclusion restrictions. For example, Mountford and Uhlig (2009) are motivated by the idea that fiscal policy shocks are identified as orthogonal to both monetary policy and business cycle shocks, and use a penalty function approach (PFA) to impose zero restrictions. (For details on the PFA, please see our <a href="http://blog.eviews.com/2019/10/sign-restricted-var-add-in.html">SRVAR blog entry</a>.) They also considered anticipated government revenue shocks in which government revenue is restricted to rise one year following some impulse. Furthermore, Beaudry, Nam, and Wang (2011) estimate a structural VAR model including total factor productivity, stock prices, real consumption, real federal funds rate and hours worked. They use the PFA to show that a positive optimism shock causes an increase in both consumption and hours worked. Recently, Arias, Rubio-Ramirez, and Waggoner (2018), henceforth ARW, developed algorithms to independently draw from a family of conjugate posterior distributions over the structural parameterization when sign and zero restrictions are used to identify SRVARs. They showed the dangers of using the PFA when implementing sign and zero restrictions together to identify structural VARs (SVARs).<br /><br /> <h3 id="sec2">Orthogonal Reduced-Form Parameterization</h3> ARW focus on two SVAR parameterizations. In addition to the classical structural parameterization, they show that SVARs can also be written as a product of a reduced-form parameters and a set of orthogonal matrices. This is called the <i>orthogonal reduced-form parameterization</i>, henceforth, ORF. The algorithms ARW propose draw from a conjugate posterior distribution over the ORF and then transform said draws into a structural parameterization. In particular, they use the normal-inverse-Wishart distribution as the prior conjugate distribution, and develop a change of variable theory that characterizes the induced family of densities over the structural parameterization. This theory shows that a uniform-normal-inverse-Wishart density over the ORF parameterization induces a normal-generalized-normal density over the structural parameterization.<br /><br /> To motivate their contribution, ARW first show that existing algorithms for SVARs identified only by sign restrictions, conditional on a sign restriction using the change of variable theory, operate on independent draws from the normal-generalized-normal distribution over the structural parameterization. These algorithms independently draw from the uniform-normal-inverse-Wishart distribution over the ORF parameterization and only accept draws that impose a sign restriction.<br /><br /> Next, ARW generalize these algorithms to also consider zero restrictions. The key to this generalization is that, conditional on the reduced-form parameters, the class of zero restrictions on the structural parameters maps to linear restrictions on the orthogonal matrices. The resulting approach generalization independently draws from normal-inverse-Wishart over the reduced-form parameters and from the set of orthogonal matrices such that the zero restrictions hold. In this regard, conditional on the zero restrictions, they show that this generalization does not induce a distribution over the structural parameterization from the family of normal-generalized-normal distributions. Furthermore, they derive the induced distribution and write an importance sampler that, conditional on the sign and zero restrictions, independently draws from normal-generalized-normal distributions over the structural parameterization.<br /><br /> To formalize these ideas, consider the SVAR with the general form: \begin{align} Y_t^{\prime} A_{0} = \sum_{i=1}^{p} Y_{t-i}^{\prime}A_{i} + c + \epsilon_t^{\prime}, \quad t=1, \ldots, T \label{eq1} \end{align} where $Y_t$ is an $n\times 1$ vector of endogenous variables, $A_i$ are parameter matrices of size of $n\times n$ with $A_{0}$ invertible, $c$ is a $1\times n$ vector of parameters, $\epsilon_t$ is an $n\times 1$ vector of exogenous structural shocks, $p$ is the lag length, and $T$ is the sample size.<br /><br /> We can also summarize equation \eqref{eq1} as follows: \begin{align} Y_{t}^{\prime}A_{0} = X_{t}^{\prime}A_{+} + \epsilon_{t}^{\prime} \label{eq2} \end{align} where $A_{+}^{\prime} = \left[A_{1}^{\prime}, \ldots, A_{p}^{\prime}, c^{\prime}\right]$ and $X_{t}^{\prime} = \left[Y_{t-1}^{\prime}, \ldots, Y_{t-p}^{\prime}, 1\right]$.<br /><br /> The reduced form can now be written as: \begin{align} Y_{t}^{\prime} = X_{t}^{\prime}B + u_{t}^{\prime} \label{eq3} \end{align} where $B = A_{+}A_{0}^{-1}, u_{t}^{\prime} = \epsilon_{t}^{\prime}A_{0}^{-1}$, and $E(u_{t}u_{t}^{\prime}) = \Sigma = \left(A_{0}A_{0}^{\prime}\right)^{-1}$. Naturally, $B$ and $\Sigma$ are the reduced form parameters.<br /><br /> We can further write equation \eqref{eq3} as the orthogonal reduced-form parameterization \begin{align} Y_{t}^{\prime} = X_{t}^{\prime}B + \epsilon_{t}^{\prime}Q^{\prime}h(\Sigma) \label{eq4} \end{align} where the $n\times n$ matrix $h(\Sigma)$ is the Cholesky decomposition of covariance matrix $\Sigma$.<br /><br /> Given equations \eqref{eq2} and \eqref{eq4}, in addition to the Cholesky decomposition $h$, we can define a mapping between $\left(A_{0}, A_{+}\right)$ and $(B, \Sigma, Q)$ by: \begin{align} f_{h}\left(A_{0}, A_{+}\right) = \left(A_{+}A_{0}^{-1}, \left(A_{0}A_{0}^{\prime}\right)^{-1}, h\left(\left(A_{0}A_{0}^{\prime}\right)^{-1}\right)A_{0}\right) \label{eq5} \end{align} where the first element of the triad on the right corresponds to $B$, the second to $\Sigma$, and the third to $Q$.<br /><br /> Note further that the function $f_{h}$ is invertible with inverse defined by: \begin{align} f_{h}^{-1} (B,\Sigma, Q) = \left(h(\Sigma)^{-1}Q, Bh(\Sigma)^{-1}Q\right) \label{eq6} \end{align} where the first term on the right corresponds to $A_{0}$ and the second to $A_{+}$.<br /><br /> Thus, the ORF parameterization makes clear how the structural parameters depend on the reduced form parameters and orthogonal matrices.<br /><br /> <h3 id="sec3">ARW Algorithms</h3> Although ARW propose three different algorithms, the most important is in fact the third. The latter draws from a distribution over the ORF parameterization conditional on the sign and zero restriction and then transforms the draws into the structural parameterization. Since Algorithm 3 also depends on Algorithm 2, we present the latter here and recommend readers to refer to the supplementary materials of ARW (2018) if they require further details.<br /><br /> <h4>Algorithm 2</h4> Let $Z_j$ define the zero restriction matrix on the $j^{\text{th}}$ structural shock, and let $z_{j}$ denote the number of zero restrictions associated with the $j^{\text{th}}$ structural shock. Then: <ol> <li>Draw $(B, \Sigma)$ independently from Normal-inverse-Wishart distribution. <li>For $j \in \{1, \ldots, n\}$ draw $X_{j} \in \mathbf{R}^{n+1-j-Z_{j}}$ independently from a standard normal distribution and set $W_{j} = X_{j} / ||X_{j}||$. <li>Define $Q = [q_{1}, \ldots q_{n}]$ recursively as $q_{j} = K_{j}W_{j}$ for any matrix $K_{j}$ whose columns form an orthonormal basis for the null space of the $(j-1+z_{j})\times n$ matrix \begin{align} M_{j} = \left[q_{1}, \ldots, q_{j-1},\left(Z_{j}F\left(f_{h}^{-1}(B, \Sigma, I_{n})\right)\right)\right] \end{align} <li>Set $(A_{0},A_{+}) = f_{h}^{-1}(B,\Sigma,Q)$.<br /><br /></ol> <h4>Algorithm 3</h4> Let $\mathcal{Z}$ denote the set of all structural parameters that satisfy the zero restrictions, and define $v_{(g^{\circ}f_{h})|\mathcal{Z}}$ as te volume element. Then: <ol> <li>Use Algorithm 2 to independently draw $(A_{0}, A_{+})$. <li>If $(A_{0}, A_{+})$ satisfies the sign restrictions, set its importance weight to $$\frac{|\det(A_{0})|^{-(2n+m+1)}}{v_{(g^{\circ}f_{h})|\mathcal{Z}}(A_{0}A_{+})}$$ otherwise, set its importance weight to zero. <li>Return to Step 1 until the required number of draws has been obtained. <li>Re-sample with replacement using the importance weights.<br /><br /></ol> <h3 id="sec4">ARW EViews Add-in</h3> Now we turn to the implementation of the ARW add-in. First, we need to download and install the add-in from the EViews website. The latter can be found at <a href="https://www.eviews.com/Addins/arw.aipz">https://www.eviews.com/Addins/arw.aipz</a>. We can also do this from inside EViews itself. In particular, after opening EViews, click on <b>Add-ins</b> from the main menu, and click on <b>Download Add-ins...</b>. From here, locate the <i>ARW</i> add-in and click on <b>Install</b>.<br /><br /> <!-- :::::::::: FIGURE 1 :::::::::: --> <center> <a href="http://www.eviews.com/blog/arw/addin_download.png"><img height="auto" src="http://www.eviews.com/blog/arw/addin_download.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 1: Add-in installation</small><br /><br /> </center><!-- :::::::::: FIGURE 1 :::::::::: --> After installing, we open the data file named as <i>data.WF1</i> which can be found in the installation folder, typically located in <b>[Windows User Folder]/Documents/EViews Addins/ARW</b>.<br /><br /> <!-- :::::::::: FIGURE 2 :::::::::: --> <center> <a href="http://www.eviews.com/blog/arw/workfile.png"><img height="auto" src="http://www.eviews.com/blog/arw/workfile.png" title="ARW (2018) Data" width="360" /></a><br /> <small>Figure 2: ARW (2018) Data</small><br /><br /> </center><!-- :::::::::: FIGURE 2 :::::::::: --> We now replicate Figures 1 and Table 3 from ARW. We can of course do this in EViews as follows.<br /><br /> <ol> <li>Click on the <b>Add-ins</b> menu item in the main EViews menu, and click on <b>Sign restricted VAR</b>. <li>Under <b>Endogenous variables</b> enter <i>tfp stock cons ffr hour</i>. <li>Check the <b>Include constant</b> option. <li>Under <b>Number of lags</b>, enter <i>4</i>. <li>In the <b>Sign restriction vector</b> textbox enter <i>+2</i>. <li>Under <b>Sign restriction method</b> check <i>Penalty</i>. <li>In the <b>Number of horizons</b> enter <i>40</i></li> <li>Under <b>Zero restriction</b> textbox enter <i>tfp</i>. <li>Check the <b>variance decomposition box</b>. <li>Hit <b>OK</b>.<br /><br /> </ol> <!-- :::::::::: FIGURE 3 :::::::::: --> <center> <a href="http://www.eviews.com/blog/arw/pfa.png"><img height="auto" src="http://www.eviews.com/blog/arw/pfa.png" title="SRVAR Add-in (PFA)" width="360" /></a><br /> <small>Figure 3: SRVAR Add-in (PFA)</small><br /><br /> </center><!-- :::::::::: FIGURE 3 :::::::::: --> The steps above produce the following output (Panel A of Figure 1 of ARW):<br /><br /> <!-- :::::::::: FIGURE 4 :::::::::: --><center> <a href="http://www.eviews.com/blog/arw/panela.png"><img height="auto" src="http://www.eviews.com/blog/arw/panela.png" title="PFA Output" width="360" /></a><br /> <small>Figure 4: PFA Output</small><br /><br /> </center><!-- :::::::::: FIGURE 4 :::::::::: --> Next, we invoke the ARW add-in and proceed with the ARW Algorithm 3.<br /><br /> <ol> <li>Click on the <b>Add-ins</b> menu item in the main EViews menu, and click on <b>Sign and zero restricted VAR</b>. <li>Under <b>Endogenous variables</b> enter <i>tfp stock cons ffr hour</i>. <li>Check the <b>Include constant</b> option. <li>Under <b>Number of lags</b>, enter <i>4</i>. <li>In the <b>Sign restriction vector</b> textbox enter <i>+stock</i>. <li>In the <b>Zero restrictions</b> textbox enter <i>tfp</i>. <li>Under<b>Number of steps</b> enter <i>40</i>. <li>Check the <b>variance decomposition box</b>. <li>Hit <b>OK</b>.<br /><br /> </ol> <!-- :::::::::: FIGURE 5 :::::::::: --> <center> <a href="http://www.eviews.com/blog/arw/isampler.png"><img height="auto" src="http://www.eviews.com/blog/arw/isampler.png" title="ARW Add-in (Importance Sampler)" width="360" /></a><br /> <small>Figure 5: ARW Add-in (Importance Sampler)</small><br /><br /> </center><!-- :::::::::: FIGURE 5 :::::::::: --> The steps above produce the following output (Panel B of Figure 1 of ARW):<br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <a href="http://www.eviews.com/blog/arw/panelb.png"><img height="auto" src="http://www.eviews.com/blog/arw/panelb.png" title="Importance Sampler Output" width="360" /></a><br /> <small>Figure 6: Importance Sampler Output</small><br /><br /> </center><!-- :::::::::: FIGURE 6 :::::::::: --> Figures 5 and 6 above illustrates the IRFs using the PFA and importance sampler methods, respectively. In case of the former, we can see the IRFs with probability bands for adjusted TFP, stock prices, consumption, real interest rate,and hours worked under the PFA. Examining the confidence bands around IRFs allows us to conclude that optimism shocks boost consumption and hours worked, as the corresponding IRFs do not contain a zero for at least 20 quarters.<br /><br /> Alternatively, the IRFs of the same variables obtained using the importance sampler yield a different result. For consumption and hours worked, the confidence bands are wider and contain zero. Furthermore, the corresponding point-wise median IRFs are closer to zero compared to those obtained using the PFA. This shows that the PFA exaggerates the effects of optimism shocks on stock prices, consumption, and hours worked, by generating much narrower confidence bands and larger point-wise median IRFs. In this regard, as pointed out by Uhlig (2005), we can see that the PFA includes additional identification restrictions when implementing sign and zero restrictions.<br /><br /> To further summarize the results, we present the table below which gives the specifics of the output figures above.<br /><br /> <center> <table style="width:100%"> <tr> <th></th> <th colspan="3">Penalty Function Approach</th> <th colspan="3">Importance Sampler</th> </tr> <tr> <td>Adjusted TFP</td> <td>0.07</td> <td><b>0.17</b></td> <td>0.29</td> <td>0.03</td> <td><b>0.11</b></td> <td>0.23</td> </tr> <tr> <td>Stock Prices</td> <td>0.54</td> <td><b>0.72</b></td> <td>0.84</td> <td>0.05</td> <td><b>0.29</b></td> <td>0.57</td> </tr> <tr> <td>Consumption</td> <td>0.13</td> <td><b>0.27</b></td> <td>0.43</td> <td>0.03</td> <td><b>0.17</b></td> <td>0.50</td> </tr> <tr> <td>Real Interest Rate</td> <td>0.07</td> <td><b>0.14</b></td> <td>0.23</td> <td>0.08</td> <td><b>0.20</b></td> <td>0.39</td> </tr> <tr> <td>Hours Worked</td> <td>0.20</td> <td><b>0.31</b></td> <td>0.45</td> <td>0.04</td> <td><b>0.18</b></td> <td>0.56</td> </tr> </table> <small>Table I: Forecast Error Variance Decomposition (FEVD)</small><br /><br /></center> Table I shows the contribution of shocks to the Forecast Error Variance Decomposition (FEVD) using the PFA and the importance sampler for the chosen horizon of 40 periods and 68 percent equal-tailed probability intervals. Under the PFA, the share of FEVD attributable to optimism shocks of consumption and hours worked is 27 and 31 percent, respectively. However, the contribution of optimism shocks to the FEVD of stock prices is 72 percent under the PFA in contrast to 29 percent using the importance sampler. It should be noted that for most variables, when using the importance sampler, optimism shocks contribute less to the FEVD, and probability intervals for the FEVD are broader as opposed to those obtained under the PFA.<br /><br /> <h3 id="sec5">Conclusion</h3> In this blog entry we presented the ARW add-in for EViews. The add-in is based on the work of ARW (2018) and generates impulse response curves based on the importance sampler which accommodates both sign and zero restrictions in the VAR model.<br /><br /> <hr /><h3 id="sec6">References</h3> <ol class="bib2xhtml"> <li><a name="arias-2018"></a>Arias J., Rubio-Ramirez J., and Waggoner D.: Inference Based on SVARs Identified with Sign and Zero Restrictions: Theory and Applications <cite>Econometrica</cite>, 86:685&#x2013;720, 2018. </li> <li><a name="beaudry-2011"></a>Beaudry P., Nam D., and Wang J.: Do mood swing drive business cycle and is it rational? <cite>NBER Working Paper 17651</cite>, 2011. </li> <li><a name="mountford-2009"></a>Mountford A. and Uhlig H.: What are the effects of fiscal policy shocks? <cite>Journal of Applied Econometrics</cite>, 24:960&#x2013;992, 2009. </li> <li><a name="uhlig-2005"></a>Uhlig H.: What are the effects of monetary policy on output? Results from an agnostic identification procedure. <cite>Journal of Monetary Economics</cite>, 52(2):381&#x2013;419, 2005. </li> </ol></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com2tag:blogger.com,1999:blog-6883247404678549489.post-44432315972472118622019-11-06T10:23:00.000-08:002019-11-06T13:02:01.360-08:00Dealing with the log of zero in regression models<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; sans-serif"><i>Author and guest post by Eren Ocakverdi</i><br /><br /> The title of this blog piece is a verbatim excerpt from the Bellego and Pape (2019) paper suggested by Professor David E. Giles in his <a href="https://davegiles.blogspot.com/2019/10/october-reading.html">October reading list</a>. (Editor's note: Professor Giles has recently announced the end of his blog - it is a fantastic resource and will be missed!). The topic is immediately familiar to practitioners who occasionally encounter the difficulty in applied work. In this regard, it is reassuring that the frustration is being addressed and that there is indeed an ongoing quest for the <i>silver bullet</i>.<a name='more'></a><br /><br /> <h3>Table of Contents</h3><ol> <li><a href="#sec1">Introduction</a> <li><a href="#sec2">A Novel Approach</a> <li><a href="#sec3">Files</a> <li><a href="#sec4">References</a></ol><br /> <h3 id="sec1">Introduction</h3> Consider the following data generating process where the dependent variable may contain zeros: $$\log(y_i) = \alpha + x_i^\prime \beta + \epsilon_i \quad \text{with} \quad E(\epsilon_i)=0$$ The most common remedy to the <i>logarithm of zero value</i> problem among practitioners is to add a common (observation independent) positive constant to the problematic observations. In other words, to work with the model: $$\log(y_i + \Delta) = \alpha + x_i^\prime \beta + \omega_i$$ where $\Delta$ is the corrective constant.<br /><br /> In the aforementioned paper, the authors use Monte Carlo simulations to demonstrate that the bias incurred by this correction is not necessarily negligible for small values of $\Delta$, and in fact, may be substantial.<br /><br /> <!-- :::::::::: FIGURE 1 :::::::::: --><center> <a href="http://www.eviews.com/blog/log_of_zero/bias.png"><img height="auto" src="http://www.eviews.com/blog/log_of_zero/bias.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 1: Estimation bias as a function of $\Delta$ </small><br /><br /></center><!-- :::::::::: FIGURE 1 :::::::::: --> In order to handle the zeros in model variables, the paper offers a new (complementary) solution that: <ol> <li>Does not generate computational bias by arbitrary normalization. </li> <li>Does not generate correlation between the error term and regressors. </li> <li>Does not require the deletion of observation.</li> <li>Does not require the estimation of a supplementary parameter.</li> <li>Does not require addition of a discretionary constant.</li><br /><br /></ol> <h3 id="sec2">A Novel Approach</h3> Bellego and Pape (2019) suggest that instead of adding a common positive constant $\Delta$, one ought to add some optimal, observation-dependent positive value $\Delta_{i}$. The novel strategy results in the following model and is estimated via GMM: $$\log(y_i + \Delta_{i}) = \alpha + x_i^\prime \beta + \eta_{i}$$ where $\Delta_i = \exp(x_i^\prime \beta)$ and $\eta_i = \log⁡(1 + \exp(\alpha + \epsilon_i))$.<br /><br /> Since the details can be referred to in the original paper, here I’d like to replicate the simulation exercise in which the authors illustrate their method and make a comparison with other approaches. (The tables below can be replicated in EViews by running the program file <i>loglinear.prg</i>.)<br /><br /> <!-- :::::::::: FIGURE 2 :::::::::: --><center> <a href="http://www.eviews.com/blog/log_of_zero/table1.png"><img height="auto" src="http://www.eviews.com/blog/log_of_zero/table1.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 2: Output of OLS estimation (with $\Delta = 1$)</small><br /><br /></center><!-- :::::::::: FIGURE 2 :::::::::: --> <!-- :::::::::: FIGURE 3 :::::::::: --><center> <a href="http://www.eviews.com/blog/log_of_zero/table2.png"><img height="auto" src="http://www.eviews.com/blog/log_of_zero/table2.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 3: Output of Pseudo Poissson Maximum Likelihood (PPML) estimation</small><br /><br /></center><!-- :::::::::: FIGURE 3 :::::::::: --> <!-- :::::::::: FIGURE 4 :::::::::: --><center> <a href="http://www.eviews.com/blog/log_of_zero/table3.png"><img height="auto" src="http://www.eviews.com/blog/log_of_zero/table3.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 4: Output of proposed solution (GMM estimation)</small><br /><br /></center><!-- :::::::::: FIGURE 4 :::::::::: --> Simulation results show that both the PPML and the GMM solutions provide correct estimates (i.e. $\alpha = 0$ , $\beta_{1} = \beta_{2} = 1$), whereas OLS results are biased due to adding a common constant to all data points. Although $\alpha$ is not identified in the proposed solution, the authors suggest OLS estimation to obtain the coefficient:<br /><br /> <!-- :::::::::: FIGURE 5 :::::::::: --><center> <a href="http://www.eviews.com/blog/log_of_zero/table4.png"><img height="auto" src="http://www.eviews.com/blog/log_of_zero/table4.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 5: OLS estimation of alpha parameter: $\log⁡(\exp(\eta_i)-1)=\alpha+\epsilon_i$</small><br /><br /></center><!-- :::::::::: FIGURE 5 :::::::::: --> When zeros are observed in both the dependent and independent variables, the authors suggest a functional coefficient model of the form: $$\log(y_i) = \alpha + \mathbb{1}_{x_i > 0}\times\log(x_i)\beta_{x_i>0}+\mathbb{1}_{x_i=0}\times\beta_{x_i=0}+\epsilon_i$$ Again, a simulation exercise is carried out to compare the estimated coefficients with different methods. (The tables below can be reproduced in EViews by running the program <i>loglog.prg</i>.)<br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --><center> <a href="http://www.eviews.com/blog/log_of_zero/table5.png"><img height="auto" src="http://www.eviews.com/blog/log_of_zero/table5.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 6: OLS estimation</small><br /><br /></center><!-- :::::::::: FIGURE 6 :::::::::: --> <!-- :::::::::: FIGURE 7 :::::::::: --><center> <a href="http://www.eviews.com/blog/log_of_zero/table6.png"><img height="auto" src="http://www.eviews.com/blog/log_of_zero/table6.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 7: PPML estimation</small><br /><br /></center><!-- :::::::::: FIGURE 7 :::::::::: --> <!-- :::::::::: FIGURE 8 :::::::::: --><center> <a href="http://www.eviews.com/blog/log_of_zero/table7.png"><img height="auto" src="http://www.eviews.com/blog/log_of_zero/table7.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 8: GMM estimation</small><br /><br /></center><!-- :::::::::: FIGURE 8 :::::::::: --> Simulation results show that the suggested (flexible) formulation of the $\beta$ coefficients works well for all estimation methods ($\alpha=0$ and $\beta = 1.5$).<br /><br /> <hr /><h3 id="sec3">Files</h3> <ol> <li><a href="http://www.eviews.com/blog/log_of_zero/deltasimul.prg">deltasimul.prg</a> <li><a href="http://www.eviews.com/blog/log_of_zero/loglinear.prg">loglinear.prg</a> <li><a href="http://www.eviews.com/blog/log_of_zero/loglog.prg">loglog.prg</a></ol><br /> <hr /><h3 id="sec4">References</h3> <ol class="bib2xhtml"> <!-- Authors: Bellego and Paper (2019) --><li><a name="bellego_pape-2019"></a>Bellego, C. and L-D. Pape. Dealing with the log of zero in regression models. <cite>CREST: Working Paper</cite>, No:2019-13, 2019.</li></ol></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-48467357268436160422019-10-14T13:50:00.001-07:002019-12-03T12:39:35.078-08:00Sign Restricted VAR Add-In<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; sans-serif"><i>Authors and guest post by Davaajargal Luvsannyam and Ulziikhutag Munkhtsetseg</i><br /><br /> Nowadays, sign restricted VARs (SRVARs) are becoming popular and can be considered as an indispensable tool for macroeconomic analysis. They have been used for macroeconomic policy analysis when investigating the sources of business cycle fluctuations and providing a benchmark against which modern dynamic macroeconomic theories are evaluated. Traditional structural VARs are identified with the exclusion restriction which is sometimes difficult to justify by economic theory. In contrast, SRVARs can easily identify structural shocks since in many cases, economic theory only offers guidance on the sign of structural impulse responses on impact.<a name='more'></a><br /><br /> <h3>Table of Contents</h3><ol> <li><a href="#sec1">Introduction</a> <li><a href="#sec2">Bayesian Inference of SRVARs</a> <li><a href="#sec3">Recovering Structural Shocks from an SRVAR</a> <li><a href="#sec4">RSVAR EViews Add-in</a> <li><a href="#sec5">Conclusion</a> <li><a href="#sec6">References</a></ol><br /> <h3 id="sec1">Introduction</h3> Following the seminal work of Uhlig (2005), the uniform-normal-inverse-Wishart posterior over the orthogonal reduced-form parameterization has been dominant for SRVARs. Recently Arias, Rubio-Ramirez and Waggoner (2018), henceforth ARW, developed algorithms to independently draw from a family of conjugate posterior distributions over the structural parameterization when sign and zero restrictions are used to identify SRVARs. In particular, They show the dangers of using penalty function approaches (PFA) when implementing sign and zero restrictions to identify structural VARs (SVARs). In this blog, we describe the SRVAR add-in based on Uhlig (2005).<br /><br /> The main difference between a classic VAR and a sign restricted VAR is interpretation. For traditional structural VARs (SVARs), there is a unique point estimate of the structural impulse response function. Because sign restrictions represent inequality restrictions, sign restricted VARs are only set identified. In other words, the data are potentially consistent with a wide range of structural models that are all admissible in that they satisfy the identifying restrictions.<br /><br /> There have been both frequentist and Bayesian approaches to summarizing estimates of the admissible set of sign-identified structural VAR models. However, the most common approach for sign restricted VARs is based on Bayesian methods of inference. For example, Uhlig (2005) used a Bayesian approach which is computationally simple and a clean way of drawing error bands for impulse responses.<br /><br /> <h3 id="sec2">Bayesian Inference of SRVARs</h3> A typical VAR model is summarized by \begin{align} Y_t = B_1 Y_{t-1} + B_2 Y_{t-2} + \cdots + B_l Y_{t-l} + u_t, \quad t=1, \ldots, T \label{eq1} \end{align} where $Y_t$ is an $m\times 1$ vector of data, $B_i$ are coefficient matrices of size of $m\times m$, and $u_t$ is the one-step ahead prediction error with variance covariance matrix $\mathbf{\Sigma}$. An intercept and a time trend is also sometimes added to \eqref{eq1}.<br /><br /> Next, stack the system in \eqref{eq1} as follows: \begin{align} \mathbf{Y} = \mathbf{XB} + \mathbf{u} \label{eq2} \end{align} where $\mathbf{Y} = [Y_{1}, \ldots, Y_{T}]^{\prime}$, $\mathbf{X} = [X_{1}, \ldots, X_{T}]^{\prime}$ and $X_{t} = [Y_{t-1}^{\prime}, \ldots, Y_{t-l}^{\prime}]$, $\mathbf{u} = [u_{1}, \ldots, u_{T}]^{\prime}$, and $\mathbf{B} = [B_{1}, \ldots, B_{l}]^{\prime}$. It is also assumed that the $u_{t}$'s are independent and normally distributed with covariance matrix $\mathbf{\Sigma}$.<br /><br /> Model \eqref{eq2} is typically estimated using maximum likelihood (ML) estimation. In particular, the ML estimates of $\left(\mathbf{B}, \mathbf{\Sigma}\right)$ is given by: \begin{align} \widehat{\mathbf{B}} &= \left(\mathbf{X}^{\prime}\mathbf{X}\right)^{-1}\mathbf{X}^{\prime}\mathbf{Y} \label{eq3} \\ \widehat{\mathbf{\Sigma}} &= \frac{1}{T}\left(\mathbf{Y} - \mathbf{X}\widehat{\mathbf{B}}\right)^{\prime}\left(\mathbf{Y} - \mathbf{X}\widehat{\mathbf{B}}\right) \label{eq4} \end{align} Next, note that a proper Wishart distribution of $\left(\mathbf{B}, \mathbf{\Sigma}\right)$ centered around $\left(\bar{\mathbf{B}}, \mathbf{S}\right)$, is characterized by the mean coefficient matrix $\bar{\mathbf{B}}$, a positive definite mean covariance matrix $\mathbf{S}$ along with an additional positive definite matrix $\mathbf{N}$ of size $ml \times ml$, and a degrees-of-freedom parameter $v \geq 0$. In this regard, Uhlig (2005) consider the priors and posterior for $\left(\mathbf{B}, \mathbf{\Sigma}\right)$ to belong to the Normal-Wishart family $W\left(\mathbf{S}^{-1} / v, v\right)$, with $E\left(\mathbf{\Sigma}^{-1}\right) = \mathbf{S}^{-1}$, whereas the columnwise vectorized form of the coefficient matrix, $vec\left(\mathbf{B}\right)$, conditional on $\mathbf{\Sigma}$, is assumed to follow the Normal distribution $\mathcal{N}\left(vec\left(\bar{\mathbf{B}}\right), \mathbf{\Sigma} \bigotimes N^{-1}\right)$.<br /><br /> Furthermore, Proposition A.1 in Uhlig (1994) shows that if the prior is characterized by the set of parameters $\left(\bar{\mathbf{B}}_{0}, \mathbf{S}_{0}, \mathbf{N}_{0}, v_{0}\right)$, the posterior is then parameterized by the set $\left(\bar{\mathbf{B}}_{T}, \mathbf{S}_{T}, \mathbf{N}_{T}, v_{T}\right)$ where: \begin{align} v_{T} &= T + v_{0} \label{eq5} \\ \mathbf{N}_{T} &= \mathbf{N}_{0} + \mathbf{X}^{\prime}\mathbf{X} \label{eq6} \\ \bar{B}_{T} &= \mathbf{N}_{T}^{-1} \left(\mathbf{N}_{0}\bar{\mathbf{B}}_{0} + \mathbf{X}^{\prime}\mathbf{X}\widehat{\mathbf{B}}\right) \label{eq7} \\ \mathbf{S}_{T} &= \frac{v_{0}}{v_{T}}\mathbf{S}_{0} + \frac{T}{v_{T}}\widehat{\mathbf{\Sigma}} + \frac{1}{v_{T}}\left(\widehat{\mathbf{B}} - \bar{\mathbf{B}}_{0}\right)^{\prime}\mathbf{N}_{0}\mathbf{N}_{T}^{-1}\left(\widehat{\mathbf{B}} - \bar{\mathbf{B}}_{0}\right) \label{eq8} \end{align} For instance, in the case of a flat prior with $\bar{\mathbf{B}}_{0}$ and $\mathbf{S}_{0}$ arbitrary and $\mathbf{N}_{0} = v_{0} = 0$, Uhlig (2005) show that $\bar{\mathbf{B}}_{T} = \widehat{\mathbf{B}}, \mathbf{S}_{T} = \widehat{\mathbf{\Sigma}}, \mathbf{N}_{T} = \mathbf{X}^{\prime}\mathbf{X},$ and $v_{T} = T$.<br /><br /> <h3 id="sec3">Recovering Structural Shocks from an SRVAR</h3> Here we consider two approaches to recovering the structural shocks from an SRVAR. The first is based on what's known as the <b>rejection method</b>. In particular, the latter consists of the following algorithmic steps: <ol> <li>Run an unrestricted VAR in order to get $\widehat{\mathbf{B}}$ and $\widehat{\mathbf{\Sigma}}$. </li> <li>Randomly draw $\bar{\mathbf{B}}_{T}$ and $\mathbf{S}_{T}$ from the posterior distributions. </li> <li>Extract the orthogonal innovations from the model using a Cholesky decomposition.</li> <li>Calculate the resulting impulse responses from Step 3.</li> <li>Randomly draw an orthogonal impulse vector $\mathbf{\alpha}$.</li> <li>Multiply the responses from Step 4 by $\mathbf{\alpha}$ and check if they match the imposed signs.</li> <li>If yes, keep the response. If not, drop the draw.</li></ol> Note here that a draw $\mathbf{\alpha}$ from an $m$-dimensional unit sphere is easily obtained drawing $\widetilde{\mathbf{\alpha}}$ from an $m$-dimensional standard normal distribution and then normalizing its length to unity. In other words, $\mathbf{\alpha} = \widetilde{\mathbf{\alpha}} / ||\widetilde{\mathbf{\alpha}}||$.<br /><br /> The second approach, proposed in Uhlig (2005), is called the <b>penalty function method</b>. In particular, the latter proposes the minimization of a penalty function given by: \begin{align} b(x) = \begin{cases} x &\quad \text{if } x \leq 0\\ 100 x &\quad \text{if } x > 0 \end{cases} \end{align} which penalizes positive responses in linear proportion, and rewards negative responses in linear proportion, albeit at a slope 100 times smaller than those on positive sides.<br /><br /> The steps involved in this algorithm can be summarized as follows: <ol> <li>Run an unrestricted VAR in order to get $\widehat{\mathbf{B}}$ and $\widehat{\mathbf{\Sigma}}$. </li> <li>Randomly draw $\bar{\mathbf{B}}_{T}$ and $\mathbf{S}_{T}$ from the posterior distributions. </li> <li>Extract the orthogonal innovations from the model using a Cholesky decomposition.</li> <li>Calculate the resulting impulse responses from Step 3.</li> <li>Minimize the penalty function with respect to an orthogonal impulse vector $\mathbf{\alpha}$.</li> <li>Multiply the responses from Step 4 by $\mathbf{\alpha}.$ </ol> Now, let $r_{(j, \mathbf{\alpha})}(k)$ denote the response of variable $j$ at step $k$ to the impulse vector $\mathbf{\alpha}$. Then the underlying minimization problem can be written as follows: \begin{align} \min_{\mathbf{\alpha}} \mathbf{\Psi}(\mathbf{\alpha}) = \sum_{j \in J}\sum_{k \in K}b\left(l_{j}\frac{r_{(j, \mathbf{\alpha})}(k)}{\sigma_{j}}\right) \end{align} To treat the signs equally, let $l_j=-1$ if the sign of restriction is positive and $l_j=1$ if the sign of restriction is negative. Scaling the variables is done by taking the standard errors, $\sigma_{j}$ of the first differences of the variables. We parameterize the impulse vector $\mathbf{\alpha}$ of the unit sphere in $n$-space by randomly drawing $n-1$ from a standard Normal distribution and mapping the draw onto the $n$ unit sphere using a stereographic projection.<br /><br /> <h3 id="sec4">SRVAR EViews Add-in</h3> Now we turn to the implementation of the SRVAR add-in. First, we need to download and install the add-in from the EViews website. The latter can be found at <a href="https://www.eviews.com/Addins/srvar.aipz">https://www.eviews.com/Addins/srvar.aipz</a>. We can also do this from inside EViews itself. In particular, after opening EViews, click on <b>Add-ins</b> from the main menu, and click on <b>Download Add-ins...</b>. From here, locate the <i>srvar</i> add-in and click on <b>Install</b>.<br /><br /> <!-- :::::::::: FIGURE 1 :::::::::: --> <center> <a href="http://www.eviews.com/blog/srvar/addin_download.png"><img height="auto" src="http://www.eviews.com/blog/srvar/addin_download.png" title="Add-ins Download" width="360" /></a><br /> <small>Figure 1: Polynomial Sieve Estimation</small><br /><br /> </center><!-- :::::::::: FIGURE 1 :::::::::: --> After installing, we import the data file named as <i>uhligdata1.xls</i> which can be found in the installation folder, typically located in <b>[Windows User Folder]/Documents/EViews Addins/srvar</b>.<br /><br /> <!-- :::::::::: FIGURE 2 :::::::::: --> <center> <a href="http://www.eviews.com/blog/srvar/workfile.png"><img height="auto" src="http://www.eviews.com/blog/srvar/workfile.png" title="Uhlig (2005) Data" width="360" /></a><br /> <small>Figure 2: Uhlig (2005) Data</small><br /><br /> </center><!-- :::::::::: FIGURE 2 :::::::::: --> Next, we take the logarithm of the series <b>gdpc1</b> (real gdp), <b>gdpdef</b> (gdp price deflator), <b>cprindex</b> (commodity price index), <b>totresns</b> (total reserves), and <b>bognonbr</b> (non-borrowed reserves). To do this, we can issue the following EViews commands:<br /><br /> <PRE><br />series gdpc1 = @log(gdpc1)*100.0<br />series gdpdef = @log(gdpdef)*100.0<br />series cprindex = @log(cprindex)*100.0<br />series totresns = @log(totresns)*100.0<br />series bognonbr = @log(bognonbr)*100.0<br /></PRE> We now replicate Figures 5, 6, and 14 from Uhlig (2005). In particular, using the aforementioned variables, Uhlig (2005) first estimate a VAR with 12 lags without a constant and trend. We can of course do this in EViews as follows:<br /><br /> <ol> <li>Click on <b>Quick/Estimate VAR...</b> to open the VAR estimation window.</li> <li>In the VAR estimation window, under <b>Endogenous variables</b>, enter <i>gdpc1 gdpdef cprindex fedfunds bognonbr totresns</i>.</li> <li>Under <b>Lag Intervals for Endogenous</b> enter <i>1 12</i></li> <li>Under the <b>Exogenous variables</b>, remove the <i>c</i> to remove the constant.</li> <li>Hit OK</li> </ol> <!-- :::::::::: FIGURE 3 :::::::::: --> <center> <a href="http://www.eviews.com/blog/srvar/basic_var.png"><img height="auto" src="http://www.eviews.com/blog/srvar/basic_var.png" title="VAR Estimation Window" width="360" /></a><br /> <small>Figure 3: VAR Estimation Window</small><br /><br /> </center><!-- :::::::::: FIGURE 3 :::::::::: --> <!-- :::::::::: FIGURE 4 :::::::::: --><center> <a href="http://www.eviews.com/blog/srvar/basic_var_results.png"><img height="auto" src="http://www.eviews.com/blog/srvar/basic_var_results.png" title="VAR Estimation Results" width="360" /></a><br /> <small>Figure 4: VAR Estimation Results</small><br /><br /> </center><!-- :::::::::: FIGURE 4 :::::::::: --> Next, we obtain the 60 period-ahead impulse response function using asymptotic standard error bands and <b>fedfunds</b> as the impulse. We can do this as follows:<br /><br /> <ol> <li>From the VAR estimation window, click on <b>View/Impulse Response...</b> to open the impulse response estimation window.</li> <li>Under <b>Display Format</b>, click <b>Multiple Graphs</b>.</li> <li>Under <b>Response Standard Errors</b>, click on <b>Analytic (asymptotic)</b></li> <li>Under <b>Impulses</b>, enter <i>fedfunds</i>.</li> <li>Under <b>Responses</b> enter <i>gdpc1 gdpdef cprindex bognonbr totresns</i></li> <li>Under <b>Periods</b>, enter <i>60</i></li> <li>Hit OK</li> </ol> <!-- :::::::::: FIGURE 5 :::::::::: --> <center> <a href="http://www.eviews.com/blog/srvar/basic_irf.png"><img height="auto" src="http://www.eviews.com/blog/srvar/basic_irf.png" title="IRF Estimation Window" width="360" /></a><br /> <small>Figure 5: IRF Estimation Window</small><br /><br /> </center><!-- :::::::::: FIGURE 5 :::::::::: --> At last, Figure 5 of Uhlig (2005) is replicated below: <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <a href="http://www.eviews.com/blog/srvar/basic_irf_graphs.png"><img height="auto" src="http://www.eviews.com/blog/srvar/basic_irf_graphs.png" title="IRF Graphs" width="360" /></a><br /> <small>Figure 6: IRF Graphs</small><br /><br /> </center><!-- :::::::::: FIGURE 6 :::::::::: --> The price puzzle pointed out by Sims (1992) is clearly visible in the graphs above. In particular, the GDP deflator increases after a contractionary monetary policy shock. By contrast, the sign restricted identification approach (show in Figure 9 below), avoids the price puzzle by construction.<br /><br /> To demonstrate how sign restricted VARs avoid the price puzzle, we now make use of the SRVAR add-in. In this regard, we first create the sign restriction vector. In particular, Uhlig (2005) suggests that the impulse responses be positive on the 4th variable <b>fedfunds</b>, and negative on the 2nd variable <b>gdpdef</b>, the 3rd variable <b>cprindex</b>, and the 5th variable <b>bognonbr</b>. Thus, we create the sign restriction vector by issuing the following command: <PRE><br />vector rest = @fill(+4, -2, -3, -5)<br /></PRE> At last, we invoke the SRVAR add-in and proceed with the rejection method as the SRVAR impulse response algorithm. We do this by clicking on the <b>Add-ins</b> menu in the main EViews menu, and click on <b>Sign restricted VAR</b>. This opens the SRVAR add-in window. There, we enter the following details:<br /><br /> <ol> <li>Under <b>Endogenous variables</b> enter <i>gdpc1 gdpdef cprindex fedfunds bognonbr totresns</i>.</li> <li>Click on <b>Include constant</b>, to remove the checkmark.</li> <li>Under <b>Number of lags</b>, enter <i>12</i>.</li> <li>In the <b>Sign restriction vector</b> textbox enter <i>+4, -2, -3, -5</i>.</li> <li>In the <b>Number of horizons</b> enter <i>60</i></li> <li>For the <b>Maximum number of restrictions</b> enter <i>6</i></li> <li>Hit OK</li> </ol> The steps above produce a graph of sign restricted VAR impulse responses which correspond to Figure 6 in Uhlig (2005). <!-- :::::::::: FIGURE 7 :::::::::: --><center> <a href="http://www.eviews.com/blog/srvar/srvar_irf_graphs.png"><img height="auto" src="http://www.eviews.com/blog/srvar/srvar_irf_graphs.png" title="SRVAR Impulse Responses (Rejection Method)" width="360" /></a><br /> <small>Figure 7: SRVAR Impulse Responses (Rejection Method)</small><br /><br /></center><!-- :::::::::: FIGURE 7 :::::::::: --> From the SRVAR impulse response graph, it is readily seen that there is no price puzzle by construction. However, the impulse response of real GDP is within a ±0.2% interval around zero. Alternatively, if using the SRVAR penalty function algorithm, the analogous figure is presented below: <!-- :::::::::: FIGURE 8 :::::::::: --><center> <a href="http://www.eviews.com/blog/srvar/srvar_irf_graphs_penalty.png"><img height="auto" src="http://www.eviews.com/blog/srvar/srvar_irf_graphs_penalty.png" title="SRVAR Impulse Responses (Penalty Function Method)" width="360" /></a><br /> <small>Figure 8: SRVAR Impulse Responses (Penalty Function Method)</small><br /><br /></center><!-- :::::::::: FIGURE 9 :::::::::: --> <h3 id="sec5">Conclusion</h3> In this blog entry we presented the sign restricted VAR add-in for EViews. The add-in is based on the work of Uhlig (2005) and generates impulse response curves based on Bayesian inference which accommodate sign restrictions in the VAR model. In the next blog, we will describe the implementation of the ARW add-in which will show how to impose zero restrictions on the impact period of the impulse response function.<br /><br /> <hr /><h3 id="sec6">References</h3> <ol class="bib2xhtml"> <!-- Authors: Uhlig Herald --><li><a name="uhlig-1994"></a>Uhlig Herald. What macroeconomist should know about unit roots: a Bayesian perspective. <cite>Economic Theory</cite>, 10:645&#x2013;671, 1994.</li> <!-- Authors: Uhlig Herald --><li><a name="uhlig-2005"></a>Uhlig Herald. What are the effects of monetary policy on output? Results from an agnostic identification procedure. <cite>Journal of Monetary Economics</cite>, 52(2):381&#x2013;419, 2005.</li> </ol></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com15tag:blogger.com,1999:blog-6883247404678549489.post-50858909833227118062019-07-17T13:20:00.001-07:002019-07-17T13:20:09.515-07:00Pyeviews update: now compatible with Python 3<span style="font-family: &quot;verdana&quot; , sans-serif;">If you’re a user of both EViews and Python, then you may already be aware of pyeviews (if not, take a look at our original blog post <a href="http://blog.eviews.com/2016/03/pyeviews-python-eviews.html" target="_blank">here</a> or our whitepaper <a href="http://www.eviews.com/download/whitepapers/pyeviews.pdf" target="_blank">here</a>).&nbsp;</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Pyeviews has been updated and is now compatible with Python 3. We’ve also added support for numpy structured arrays and several additional time series frequencies.&nbsp;</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">You can get these updates through pip:</span><br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">pip install pyeviews</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Through the conda-forge channel in Anaconda:</span><br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">conda install pyeviews -c conda-forge</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Or by typing:</span><br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">python setup.py install</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">in your installation directory.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><br />IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-22104609406423388342019-06-26T13:04:00.000-07:002019-06-27T09:54:36.913-07:00Bayesian VAR Prior Comparison<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], ubar: ['{\\mkern 0.5mu\\underline{\\mkern-0.5mu#1\\mkern-0.5mu}\\mkern 0.5mu}', 1], undrln: ['{\\rlap{{\\hspace{-1pt}}\\underline{\\hphantom{H}}}{#1^{#4}\\vphantom{\\beta}}_{\\hspace{#3}\\vphantom{\\underline{}}_{#2}}}', 4], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; sans-serif"> EViews 11 introduces a completely new Bayesian VAR engine that replaces one from previous versions of EViews. The new engine offers two new major priors; the Independent Normal-Wishart and the Giannone, Lenza and Primiceri, that compliment the previously implemented Minnesota/Litterman, Normal-Flat, Normal-Wishart and Sims-Zha priors. The new priors were enhanced with new options for forming the underlying covariance matrices that make up essential components of the prior.<a name='more'></a><br /><br /> The covariance matrices that form the prior specification are generally formed by specifying a matrix alongside a number of hyper-parameters which define any non-zero elements of the matrix. The hyper-parameters themselves are either selected by the researcher, or taken from an initial error covariance estimate. Sensitivity of the posterior distribution to the choice of hyper-parameter is a well researched topic, with practitioners often selecting many different hyper-parameter values to check their analysis does not change based solely on (an often arbitrary) choice of parameter. However, this sensitivity analysis is restricted to the parameters selected by the researcher, with often only passing thought given to those estimated by an initial covariance estimate.<br /><br /> Since EViews 11 offers a number of choices for estimating the initial covariance, we thought it would be interesting to perform a comparison of forecast accuracy both across prior types, and across choices of initial covariance estimate.<br /><br /> <h3>Table of Contents</h3><ol> <li><a href="#sec1">Prior Technical Details</a> <li><a href="#sec2">Estimating a Bayesian VAR in EViews</a> <li><a href="#sec3">Data and Models</a> <li><a href="#sec4">Results</a> <li><a href="#sec5">Conclusions</a></ol><br /> <h3 id="sec1">Prior Technical Details</h3> We will not provide in-depth details of each prior type here, leaving such details to the <a href="http://www.eviews.com/help/helpintro.html#page/content%2FbVAR-Bayesian_VAR_Models.html%23">EViews documentation</a> and its <a href="http://www.eviews.com/help/content/bVAR-References.html#">references</a>. However we will provide a summary with enough details to demonstrate how an initial covariance matrix influences each prior type. We will also, for sake of notational convenience, ignore exogenous variables and the constant from our discussion.<br /><br /> First we write the VAR as: $$y_t = \sum_{j=1}^p\Pi_jy_{t-j}+\epsilon_t$$ where <ul> <li><h4></h4>$y_t = (y_{1t},y_{2t}, ..., y_{Mt})'$ is an M vector of endogenous variables <li><h4></h4>$\Pi_j$ are $M\times M$ matrices of lag coefficients <li><h4></h4>$\epsilon_t$ is an $M$ vector of errors where we assume $\epsilon_t\sim N(0,\Sigma)$<br /><br /></ul> If we define $x_t=(y_{t-1}', ..., y_{t-p})$ stack variables to form, for example, $Y = (y_1, ...., y_T)'$, and let $y=vec(Y')$, the multivariate normal assumption on $\epsilon_t$ gives us: $$(y\mid \beta)\sim N((X\otimes I_M)\beta, I_T\otimes \Sigma)$$ Bayesian estimation of VAR models then centers around the derivation of posterior distributions of $\beta$ and $\Sigma$ based upon the above multivariate distribution, and prior distributional assumptions on $\beta$ and $\Sigma$.<br /><br /> To demonstrate how each prior relies on an initial estimate of $\Sigma$, for the priors other than Litterman, we only need to consider the component of each prior relating to the distribution $\beta$, and in particular its covariance. <ol> <li><h4><i>Litterman/Minnesota Prior</i></h4> $$\beta \sim N\left(\undrln{\beta}{Mn}{2.25pt}{}, \undrln{V}{Mn}{2.25pt}{}\right)$$ $\undrln{V}{Mn}{2.25pt}{}$ is assumed to be a diagonal matrix. The diagonal elements corresponding to endogenous variables, $i,j$ at lag $l$ are specified by: $$\undrln{V}{Mn, i,j}{-4.5pt}{l} = \begin{cases} \left(\frac{\lambda_1}{l^{\lambda_3}}\right)^2 &\text{for } i = j\\ \left(\frac{\lambda_1 \lambda_2 \sigma_i}{l^{\lambda_3} \sigma_j}\right)^2 &\text{for } i \neq j \end{cases}$$ where $\lambda_1$, $\lambda_2$ and $\lambda_3$ are hyper-parameters chosen by the researcher, and $\sigma_i$ is the square root of the corresponding $(i,i)^{\text{th}}$ element of an initial estimate of $\Sigma$.<br /><br /> The Litterman/Minnesota prior also assumes that $\Sigma$ is fixed, forming no prior on $\Sigma$, just using the initial estimate as given.<br /><br /> <li><h4><i>Normal-Flat and Normal-Wishart</i></h4> $$\beta\mid\Sigma\sim N\left(\undrln{\beta}{N}{2.25pt}{}, \undrln{H}{N}{0pt}{}\otimes\Sigma\right)$$ where $\undrln{H}{N}{0pt}{} = c_3I_M$ and $c_3$ is a chosen hyper-parameter. As such, the Normal-Flat and Normal-Wishart priors do not rely on an initial estimate of the error covariance at all.<br /><br /> <li><h4><i>Independent Normal-Wishart</i></h4> $$\beta\sim N\left(\undrln{\beta}{INW}{2.25pt}{}, \undrln{H}{INW}{0pt}{}\otimes\Sigma\right)$$ where, again, $\undrln{H}{INW}{0pt}{} = c_3I_M$ and $c_3$ is a chosen hyper-parameter. Thus, like the Normal-Flat and Normal-Wishart priors the prior matrices do not depend upon an initial $\Sigma$ estimate. However, the Independent Normal-Wishart requires an MCMC chain to derive the posterior distributions, and the MCMC chain does require an initial estimate for $\Sigma$ to start the chain (although, hopefully, the impact of this starting estimate should be minimal).<br /><br /> <li><h4><i>Sims-Zha</i></h4> $$\beta\mid\beta_0\sim N\left(\undrln{\beta}{SZ}{2.25pt}{}, \undrln{H}{SZ}{0pt}{}\otimes\Sigma\right)$$ $\undrln{H}{SZ}{0pt}{}$ is assumed to be a diagonal matrix. The diagonal elements corresponding to endogenous variables, $i,j$ at lag $l$ are specified by: $$\undrln{H}{SZ, i, j}{-4.5pt}{l} = \left(\frac{\lambda_0\lambda_1}{\sigma_j l^{\lambda_3}}\right)^2 \text{for } i = j$$ where $\lambda_0$, $\lambda_1$ and $\lambda_3$ are hyper-parameters chosen by the researcher, and $\sigma_i$ is the square root of the corresponding $(i,i)^{\text{th}}$ element of an initial estimate of $\Sigma$.<br /><br /> <li><h4><i>Giannone, Lenza and Primiceri</i></h4> $$\beta\mid\beta_0\sim N(\undrln{\beta}{GLP}{2.25pt}{}, \undrln{H}{GLP}{0pt}{}\otimes\Sigma)$$ $\undrln{H}{GLP}{0pt}{}$ is assumed to be a diagonal matrix. The diagonal elements corresponding to endogenous variables, $i,j$ at lag $l$ are specified by: $$\undrln{H}{GLP,i,j}{-4.5pt}{l} = \left(\frac{\lambda_1}{\phi_j l^{\lambda_3}}\right)^2 \text{for } i = j$$ where $\lambda_1$, $\lambda_3$ and $\phi_j$ are hyper-parameters of the prior.<br /><br /> GLP's method revolves around using optimization techniques to select the optimal hyper-parameter values. However, it is possible to optimize only a subset of the hyper-parameters and select others. $\phi_j$ is often set, rather than optimized, as $\phi_j = \sigma_j$ and is the square root of the corresponding $(j,j)^{\text{th}}$ element of an initial estimate of $\Sigma$. Even when $\phi_j$ is optimized rather than set, an inititial estimate is used as the starting point of the optimizer.<br /><br /></ol> Of these priors, only the normal-flat and normal-Wishart priors do not rely on an initial estimate of $\Sigma$ at all. Consequently the method used for that initial estimate might have a large impact on the final results.<br /><br /> Different implementations of Bayesian VAR estimations use different methods to calculate the initial $\Sigma$. Some of these methods are:<br /><br /> <ul> <li><h4></h4>A classical VAR model. <li><h4></h4>A classical VAR model with the off-diagonal elements replaced with zero. <li><h4></h4>A univariate AR(p) model for each endogenous variable (forcing $\Sigma$ to be diagonal). <li><h4></h4>A univariate AR(1) model for each endogenous variable (forcing $\Sigma$ to be diagonal).<br /><br /></ul> With each of these methods, there is also the decision as to whether to degree-of-freedom adjust the final estimate (and if so, by what factor), and whether to include any exogenous variables from the Bayesian VAR in the calculation of the classical VAR or univariate AR models.<br /><br /> Bayesian VAR priors can be complimented with the addition of dummy-observation priors to increase the predictive power of the model. There are two specific priors - the sum-of-coefficients prior that adds additional observations to the start of the data to account for any unit root issues, and the dummy-initial-observation prior which adds additional observations to account for cointegration.<br /><br /> With the addition of extra observations to the data used in the Bayesian prior, there is also a choice to be made as whether those additional observations are also included in any initial covariance estimation.<br /><br /> <h3 id="sec2">Estimating a Bayesian VAR in EViews</h3> Estimating VARs in EViews is straight forward, you simply select the variables you want in your VAR, right click, select <i>Open As VAR</i> and then fill in the details of the VAR, including the estimation sample and the number of lags. For Bayesian VARs the only additional steps that need to be taken are changing the VAR type to Bayesian, and then filling in the details of the prior you want to use and any hyper-parameter specification.<br /><br /> For full details on how to estimate a Bayesian VAR in EViews, refer to the <a href="http://www.eviews.com/help/content/bVAR-Estimating_a_Bayesian_VAR_in_EViews.html#">documentation</a>, and <a href="http://www.eviews.com/help/content/bVAR-Examples.html#">examples</a>.<br /><br /> However we’ve also provided a simple video demonstration of both importing the data used in this blog post, and estimating and forecasting the normal-Wishart prior.<br /><br /> <center><iframe width="640" height="540" src="http://www.eviews.com/blog/bvar/video/video_player.html?embedIFrameId=embeddedSmartPlayerInstance" webkitallowfullscreen=""></iframe><br /><br /></center> <h3 id="sec3">Data and Models</h3> To evaluate the forecasting performance of the priors under different initial covariance estimation methods, we'll perform an experiment closely following that performed in Giannone, Lenza and Primiceri (GLP). Notably, we use the Stock and Watson (2008) data set which includes data on 149 quarterly US macroeconomic variables between 1959Q1 and 2008Q4. <br /><br /> Following GLP we produce forecasts from the BVARs recursively for two forecast lengths (1 quarter and 1 year), starting with data from 1959 to 1974, then increasing the estimation sample by one quarter at a time, to give 128 different estimations.<br /><br /> We perform two sets of experiments, each representing a different sized VAR:<br /><br /> <ul> <li><h4></h4>SMALL containing just three variables - GDP, the GDP deflator and the federal funds rate. <li><h4></h4>MEDIUM containing seven variables - adding consumption, investment, hours and wages.<br /><br /></ul> Each of these VARs is estimated at five lags using a classical VAR and 39 different combinations of prior and initial covariance options:<br /><br /> <!-- :::::::::: TABLE 0 :::::::::: --><center> <a href="http://www.eviews.com/blog/bvar/table0.png"><img height="auto" src="http://www.eviews.com/blog/bvar/table0.png" title="Models Overview" width="600" /></a><br /><br /></center><!-- :::::::::: TABLE 0 :::::::::: --> After each BVAR estimation, Bayesian sampling of the forecast period is performed - drawing from the full posterior distributions for the Litterman, Normal-flat, Normal-Wishart and Sims-Zha priors, and running MCMC draws for the Independent normal-Wishart and GLP priors. The mean of the draws is used as a point estimate, and the root mean square error (RMSE) is calculated. Each forecast draw uses 100,000 iterations. With 39*128=4,992 forecasts and two sizes of VARs, that is a total of 1 billion draws!<br /><br /> <h3 id="sec4">Results</h3> The following tables show the average root-mean square of each of the four sets of forecasts. Click on a table to enlarge the image.<br /><br /> <!-- :::::::::: TABLE 1 :::::::::: --><center> <a href="http://www.eviews.com/blog/bvar/table1.png"><img height="auto" src="http://www.eviews.com/blog/bvar/table1.png" title="Three variable VAR one quarter GDP forecast RMSE" width="720" /></a><br /><br /></center><!-- :::::::::: TABLE 1 :::::::::: --> <!-- :::::::::: TABLE 2 :::::::::: --><center> <a href="http://www.eviews.com/blog/bvar/table2.png"><img height="auto" src="http://www.eviews.com/blog/bvar/table2.png" title="Three variable VAR one year GDP forecast RMSE" width="720" /></a><br /><br /></center><!-- :::::::::: TABLE 2 :::::::::: --> <!-- :::::::::: TABLE 3 :::::::::: --><center> <a href="http://www.eviews.com/blog/bvar/table3.png"><img height="auto" src="http://www.eviews.com/blog/bvar/table3.png" title="Five variable VAR one quarter GDP forecast RMSE" width="720" /></a><br /><br /></center><!-- :::::::::: TABLE 3 :::::::::: --> <!-- :::::::::: TABLE 4 :::::::::: --><center> <a href="http://www.eviews.com/blog/bvar/table4.png"><img height="auto" src="http://www.eviews.com/blog/bvar/table4.png" title="Five variable VAR one year GDP forecast RMSE" width="720" /></a><br /><br /></center><!-- :::::::::: TABLE 4 :::::::::: --> <h3 id="sec5">Conclusions</h3>For the three variable one-quarter ahead experiment, it is clear that the GLP prior is more effective than the other prior types, although the Litterman prior is relatively close in accuracy. In terms of which covariance method performs best, there is no clear winner, with the differences between covariance choice only having a large impact on the Litterman and GLP priors.<br /><br /> The choice of whether to include dummy observation priors, and if so whether to include them in the covariance calculation, choice appears to only impact the GLP prior severely.<br /><br /> The overall winner, at least in terms of RMSE, was the GLP prior with a diagonal VAR used for initial covariance choice without dummy observations.<br /><br /> A similar story is told for the three variable one-year ahead experiment, however this time the Litterman prior is the clear winner. Again there is not much difference between covariance choices and dummy observation choices. Notably, although Litterman does best across the options, the overall most accurate was the Normal-flat.<br /><br /> Expanding to the five variable VARs, the one-quarter ahead experiment is not as clear-cut as the three variable equivalent. Across covariance options is a toss-up between Litterman and GLP. The effect of covariance has a bigger impact, with the Univariate AR(5) option looking best.<br /><br /> For the first time, optimizing $\phi$ in the GLP prior has a positive impact, with the version including dummy observations being the overall most accurate option combination.<br /><br /> The final experiment is similar, no clear-cut winner in terms of prior choice, although Litterman might just edge GLP. Choice of covariance again has an impact, with again a univariate AR(5) looking best.<br /><br /> Across all the experiments it is difficult to give an overall winner. The original Litterman and GLP priors are ahead of the others, but knowing which covariance choice to select or whether to include dummy observations is more ambiguous.<br /><br /> One absolutely clear result is, however, that no matter which combination of prior and options are selected, the Bayesian VAR will vastly outperform a classical VAR.<br /><br /> Finally, it is worth mentioning that these results are, with the obvious exception of the GLP prior, for a fixed set of hyper-parameters, and the conclusions may differ if attention is given to simultaneously finding the best set of hyper-parameters and covariance choice. </span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com1tag:blogger.com,1999:blog-6883247404678549489.post-24918477283176795732019-05-13T09:34:00.000-07:002019-05-14T11:00:17.335-07:00Functional Coefficient Estimation: Part I (Nonparametric Estimation)<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; sans-serif"> Recently, EViews 11 introduced several new nonparametric techniques. One of those features is the ability to estimate functional coefficient models. To help familiarize users with this important technique, we're launching a multi-part blog series on nonparametric estimation, with a particular focus on the theoretical and practical aspects of functional coefficient estimation. Before delving into the subject matter however, in this Part I of the series, we give a brief and gentle introduction to some of the most important principles underlying nonparametric estimation, and illustrate them using EViews programs.<a name='more'></a><br /><br /> <h3>Table of Contents</h3><ol> <li><a href="#sec1">Nonparametric Estimation</a> <li><a href="#sec2">Global Methods</a> <ol type="i"> <li><a href="#sec2.1">Optimal Sieve Length</a> <li><a href="#sec2.2">Critiques</a> </ol> <li><a href="#sec3">Local Methods</a> <ol type="i"> <li><a href="#sec3.1">Localized Kernel Regression</a> <li><a href="#sec3.2">Bandwidth Selection</a> </ol> <li><a href="#sec4">Conclusion</a> <li><a href="#sec5">Files</a> <li><a href="#sec6">References</a></ol><br /> <h3 id="sec1">Nonparametric Estimation</h3> Traditional least squares regression is parametric in nature. It confines relationships between the dependent variable $Y_{t}$ and independent variables (regressors) $X_{1,t}, X_{2,t}, \ldots$ to be, in expectation, linear in the parameter space. For instance, if the true data generating process (DGP) for $Y_{t}$ derives from $p$ regressors, the least squares regression model postulates that: $$Y_{t} = m(x_{1}, \ldots, x_{p}) \equiv E(Y_t | X_{1,t} = x_{1}, \ldots, X_{p,t} = x_{p}) = \beta_0 + \sum_{k=1}^{p}{\beta_k x_{k}}$$ Since this relationship holds only in expectation, a statistically equivalent form of this statement is: \begin{align} Y_t &= m\left(X_{1,t}, \ldots, X_{p,t}\right) + \epsilon_{t} \nonumber \\ &=\beta_0 + \sum_{k=1}^{p}{\beta_k X_{k,t}} + \epsilon_t \label{eq.1.1} \end{align} where the error term $\epsilon_{t}$ has mean zero, and parameter estimates are solutions to the minimization problem: $$\arg\!\min_{\hspace{-1em}\beta_{0}, \ldots, \beta_{p}} E\left(Y_{t} - \beta_0 + \sum_{k=1}^{p}{\beta_k X_{k,t}}\right)^{2}$$ Nevertheless, while this framework is typically sufficient for most applications, and is obviously very appealing and intuitive, when the true but unknown DGP is in fact non-linear, inference is rendered unreliable.<br /><br /> On the other hand, nonparametric modelling prefers to remain agnostic about functional forms. Relationships are, in expectation, simply functionals $m(\cdot)$, and if the true DGP for $Y_{t}$ is a function of $p$ regressors, then: $$Y_t = m\left(X_{1,t}, \ldots, X_{p,t}\right) + \epsilon_{t}$$ Here, estimators of $m(\cdot)$ can generally be cast as minimization problems of the form: \begin{align} \arg\!\min_{\hspace{-1em} m\in \mathcal{M}} E\left(Y_{t} - m\left(X_{1,t}, \ldots, X_{p,t}\right)\right)^{2} \label{eq.1.2} \end{align} where $\mathcal{M}$ is now a function space. In this regard, a nonparametric estimator can be thought of as a solution to a search problem over functions as opposed to parameters.<br /><br /> The problem in \eqref{eq.1.2}, however, is infeasible. It turns out the function space is effectively uncountable. In fact, even if arguing to the contrary, solutions would be unidentified since different functions in $\mathcal{M}$ can map to the same range. Accordingly, general practice is to reduce $\mathcal{M}$ to a lower dimensional countable space and optimize over it. This typically implies a reduction of the problem to a parametric framework so that the problem in \eqref{eq.1.2} is cast into: \begin{align} \arg\!\min_{\hspace{-1em} m\in \mathcal{M}} E\left(Y_{t} - h\left(X_{1,t}, \ldots, X_{p,t}; \mathbf{\Theta} \right)\right)^{2} \label{eq.1.3} \end{align} where $h(\cdot; \mathbf{\Theta}) \in \mathcal{H}$ is a function with associated parameters $\mathbf{\Theta} \in \mathbf{R}^{q}$ and $\mathcal{H}$ is a function space which is <i>dense</i> in $\mathcal{M}$; formally, $h^{\star} \in \mathcal{H} \rightarrow m^{\star} \in \mathcal{M}$ where $\rightarrow$ denotes asymptotic convergence. Recall that this means that any feasible estimate $h^{\star}$ must become arbitrarily close to the unfeasible estimate $m^{\star}$ as the space $\mathcal{H}$ grows to asymptotic equivalence with $\mathcal{M}$. In this regard, nonparametric estimators are typically classified into either <i>global</i> or <i>local</i> kinds.<br /><br /> <h3 id="sec2">Global Methods</h3> Global estimators, generally synonymous with the class of <i>sieve</i> estimators introduced by Grenander (1981), approximate arbitrary functions by simpler functions which are uniformly dense in the target space $\mathcal{M}$. A particularly important class of such estimators are <i>linear sieves</i> which are constructed as linear combinations of popular basis functions. The latter include <i>Bernstein polynomials</i>, <i>Chebychev polynomials</i>, <i>Hermite polynomials</i>, <i>Fourier series</i>, <i>polynomial splines</i>, <i>B-splines</i>, and <i>wavelets</i>. Formally, when the function $m(\cdot)$ is univariate, linear sieves assume the following general structure: \begin{align} \mathcal{H}_{J} = \left\{h \in \mathcal{M}: h(x; \mathbf{\Theta}) = \sum_{j=1}^{J}\theta_{k}f_{j}(x)\right\} \label{eq.1.4} \end{align} where $\theta_{j} \in \mathbf{\Theta}$, $f_{j}(\cdot)$ is one of the aforementioned basis functions, and $J \rightarrow \infty$.<br /><br /> For instance, if the sieve exploits the <i>Stone-Weierstrass' Approximation Theorem</i> which claims that any continuously differentiable function over a compact interval, can be uniformly approximated on that interval by a polynomial to any degree, then $f_{j}(x) = x^{j-1}$. In particular, if the unknown function of interest is $m(x)$, then choosing to approximate the latter with a polynomial of degree $J = J^{\star} < \infty$ (some integer), reduces the problem in \eqref{eq.1.4} to: $$\arg\!\min_{\hspace{-1em} m\in \mathcal{M}} E\left(Y_{t} - \theta_{0} + \sum_{j=1}^{J^{\star}}\theta_{j}X_{t}^{j} \right)^{2}$$ where $Y_{t}$ are the values we observe from the theoretical function $m(x)$, and $X_{t}$ is the regressor we're using to estimate it. Usual least squares now yields $\widehat{\theta}_{j}$ for $j=1,\ldots, J^{\star}$. Furthermore, $m(x)$ can be approximated as $$m(x) \approx \widehat{\theta}_{0} + \sum_{j=1}^{J^{\star}}\widehat{\theta}_{j}x^{j}$$ where $x$ is evaluated on some grid $[a,b]$, where it can have arbitrary length, or even on the original regressor values so that $x \equiv X_{t}$.<br /><br /> To demonstrates the procedure, define the true but unknown function $m(x)$ as: \begin{align} m(x) = \sin(x)\cos(\frac{1}{x}) + \log\left(x + \sqrt{x^2+1}\right) \quad x \in [-6,6]\label{eq.1.5} \end{align} Furthermore, generate observable data from $m(x)$ as $Y_{t} = m(x) + 0.5\epsilon_{t}$ and generate the regressor data as $X_{t} = x - 0.5 + \eta_{t}$ where $\epsilon_{t}$ and $\eta_{t}$ are mutually independent respectively standard normal and standard uniform random variables. Estimation is now summarized for polynomial degrees 1, 5, and 15, respectively.<br /><br /> <!-- :::::::::: FIGURE 1 :::::::::: --><center> <a href="http://www.eviews.com/blog/funcoef/polysieveplot.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/polysieveplot.jpeg" title="Polynomial Sieve Estimation" width="360" /></a><br /> <small>Figure 1: Polynomial Sieve Estimation</small><br /><br /></center><!-- :::::::::: FIGURE 1 :::::::::: --> Alternatively, if the sieve exploits Hermite polynomials, one can construct the <i>Gaussian sieve</i> which reduces the problem in \eqref{eq.1.4} to: $$\arg\!\min_{\hspace{-1em} m\in \mathcal{M}} E\left(Y_{t} - \theta_{0} + \sum_{j=1}^{J^{\star}}\theta_{j}\phi(X_{t})H_{j}(X_{t}) \right)^{2}$$ where $\phi(\cdot)$ is the standard normal density and $H_{j}(\cdot)$ are Hermite polynomials of degree $j$. The figure below demonstrates the procedure using sieve lengths 1, 3, and 10, respectively.<br /><br /> <!-- :::::::::: FIGURE 2 :::::::::: --><center> <a href="http://www.eviews.com/blog/funcoef/gausssieveplot.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/gausssieveplot.jpeg" title="Gaussian Sieve Estimation" width="360" /></a><br /> <small>Figure 2: Gaussian Sieve Estimation</small><br /><br /></center><!-- :::::::::: FIGURE 2 :::::::::: --> Clearly, both sieve estimators are very similar. So how does one select an <i>optimal</i> sieve? There really isn't a prescription for such optimization. Each sieve has its advantages and disadvantages, but the general rule of thumb is to choose a sieve that most closely resembles the function of interest $m(\cdot)$. For instance, if the function is polynomial, then using a polynomial sieve is probably best. Alternatively, if the function is expected to be smooth and concentrated around its mean, a Gaussian sieve will work well. On the other hand, the question of optimal sieve length lends itself to more concrete advice.<br /><br /> <h4 id="sec2.1">Optimal Sieve Length</h4> Given the examples explored above, it is evident that sieve length plays a major role in fitting accuracy. For instance, estimation with a low sieve length resulted in severe underfitting, while a higher sieve length resulted in better fit. The question of course is whether an optimal length can be determined.<br /><br /> Li et. al. (1987) studied three well-known procedures, all of which are based on the mean squared forecast error of the estimated function over a search grid $\mathcal{J} \equiv \left\{J_{min},\ldots, J_{max}\right\}$, and all of which are asymptotically equivalent. In particular, let $J^{\star}$ the optimal sieve length and consider:<br /><br /> <ol> <li>$C_{p}$ method due to Mallows (1973): $$J^{\star} = \min_{J \in \mathcal{J}} \frac{1}{T}\sum_{t=1}^{T}\left(Y_{t} - \widehat{m}(X_{t})\right)^{2} - 2\widehat{\sigma}^{2}\frac{J}{T}$$ where $\widehat{\sigma}^{2} = \frac{1}{n}\sum_{t=1}^{T}\left(Y_{t} - \widehat{m}(X_{t})\right)^{2}$ <li>Generalized cross-validation method due to Craven and Wahba (1979): $$J^{\star} = \min_{J \in \mathcal{J}} \frac{1}{(1 - (J/2))^{2}T}\sum_{t=1}^{T}\left(Y_{t} - \widehat{m}(X_{t})\right)^{2}$$ <li>Leave-one-out cross validation method due to Stone (1974): $$J^{\star} = \min_{J \in \mathcal{J}} \frac{1}{T}\sum_{t=1}^{T}\left(Y_{t} - \widehat{m}_{\setminus t^{\star}}(X_{t})\right)^{2}$$ where the subscript notation $\setminus t^{\star}$ indicates estimation after dropping observation $t^{\star}$. </ol> Here we discuss the algorithm for the last of the three procedures. In particular, with the search grid $\mathcal{J}$ defined as before, iterate the following steps over $J \in \mathcal{J}$: <ol> <li>For each observation $t^{\star} \in \left\{1, \ldots, T \right\}$: <ol type="i"> <li>Solve the optimization problem in \eqref{eq.1.4} using data from the pair $(Y_{t}, X_{t})_{t \neq t^{\star}}$, and derive the estimated model as follows: $$\widehat{m}_{J,\setminus t^{\star}}(x) \equiv \widehat{\theta}_{_{J,\setminus t^{\star}}0} + \sum_{j=1}^{J}\widehat{\theta}_{_{J,\setminus t^{\star}}j}f_{j}(x)$$ where the subscript $J,\setminus t^{\star}$ indicates that parameters are estimated using sieve length $J$, after dropping observation $t^{\star}$. <li>Derive the forecast error for the dropped observation as follows: $$e_{_{J}t^{\star}} \equiv Y_{t^{\star}} - \widehat{m}_{J,\setminus t^{\star}}(X_{t^{\star}})$$ </ol> <li>Derive the cross-validation mean squared error for sieve length $J$ as follows: $$MSE_{J} = \frac{1}{T}\sum_{t=1}^{T} e_{_{J}t}^{2}$$ <li>Determine the optimal sieve length $J^{\star}$ as the minimum $MSE_{J}$ across $\mathcal{J}$. In other words $$J^{\star} = \min_{J\in\mathcal{J}} MSE_{J}$$ </ol> In words, the algorithm moves across the sieve search grid $\mathcal{J}$ and computes an out-of-sample forecast error for each observation. The optimal sieve length is that which minimizes the average mean squared error across the search grid. We demonstrate the selection criteria and accompanied estimation when using a grid search from 1 to 15.<br /><br /> <!-- :::::::::: FIGURE 3 :::::::::: --><center> <a href="http://www.eviews.com/blog/funcoef/optest.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/optest.jpeg" title="Sieve Regression with Optimized Sieve Length Selection" width="720" /></a><br /> <small>Figure 3: Sieve Regression with Optimized Sieve Length Selection</small><br /><br /></center><!-- :::::::::: FIGURE 3 :::::::::: --> Evidently, both the polynomial and Gaussian sieve models ought to use a sieve length of 15.<br /><br /> <h4 id="sec2.2">Critiques</h4> While global nonparametric estimators are easy to work with, they exhibit several well recognized drawbacks. First, they leave little room for fine-tuning estimation. For instance, in the case of polynomial sieves, the polynomial degree is not continuous. In other words, if estimation underfits when sieve length is $J$, but overfits when sieve length is $J+1$, then there is no polynomial degree $J < J^{\star} < J+1$.<br /><br /> Second, global estimators are often subject to infeasibility since regressor values may not be sufficiently small. This is because increased sieve lengths can result in the values of the regressor covariance matrix to become extremely large. In turn, this can render the inverse of the covariance matrix nearly singular, and by extension, render estimation infeasible. In other words, at some point, increasing the polynomial degree further does not lead to estimate improvements.<br /><br /> Lastly, it is worth pointing out that global estimators fit curves by smoothing (averaging) over the entire domain. As such, they can have difficulties handling observations with strong influences such as outliers and regime switches. This is due to the fact that outlying observations will be averaged with the rest of the data, resulting in a curve that significantly under- or over- fits these observations. To illustrate this point, consider a modification of equation \eqref{eq.1.5} with outliers when $-1 < x \leq 1$ : \begin{align} m(x) = \begin{cases} \sin(x)\cos(\frac{1}{x}) + \log\left(x + \sqrt{x^2+1}\right) & \text{if } x\in [-6,1]\\ \sin(x)\cos(\frac{1}{x}) + \log\left(x + \sqrt{x^2+1}\right) + 4 & \text{if } x \in (-1,1]\\ \sin(x)\cos(\frac{1}{x}) + \log\left(x + \sqrt{x^2+1}\right) - 2 & \text{if } x \in (1,6] \end{cases}\label{eq.1.6} \end{align} We generate $Y_{t}$ and $X_{t}$ as before, and estimate this model using both polynomial and Gaussian sieves based on cross-validated sieve length selection.<br /><br /> <!-- :::::::::: FIGURE 4 :::::::::: --><center> <a href="http://www.eviews.com/blog/funcoef/optestoutliers.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/optestoutliers.jpeg" title="Sieve Regression with Optimized Sieve Length Selection and Outliers" width="720" /></a><br /> <small>Figure 4: Sieve Regression with Optimized Sieve Length Selection and Outliers</small><br /><br /></center><!-- :::::::::: FIGURE 4 :::::::::: --> Clearly, both procedures have a difficult time handling jumps in the domain region $-1 < x \leq 1$. Nevertheless, it is evident that the Gaussian sieve does significantly better than polynomial regression. This is further corroborated by the leave-one-out cross-validation MSE values which indicate that the Gaussian sieve minimum MSE is roughly 6 times as small as the polynomial sieve minimum MSE.<br /><br /> It turns out that a number of these shortcomings can be mitigated by averaging locally instead of globally. In this regard, we turn to the idea of <i>local estimation</i> next.<br /><br /> <h3 id="sec3">Local Methods</h3> The general idea behind local nonparametric estimators is <i>local averaging</i>. The procedure partitions the functional variable $x$ into <i>bins</i> of a particular size, and estimates $m(x)$ as a linear interpolation of the average values of the dependent variable at the middle of each bin. We demonstrate the procedure when $m(x)$ is the function in \eqref{eq.1.5}.<br /><br /> In particular, define $Y_{t}$ as before, but let $X_{t} = x$. In other words, we consider deterministic regressors. We will relax the latter assumption later, but this is momentarily more instructive as it leads to contiguous partitions of the explanatory variable $X_{t}$. At last, define the bins as quantiles of $x$ and consider the procedure with bin partitions equal to 2, 5, 15, and 30, respectively.<br /><br /> <!-- :::::::::: FIGURE 5 :::::::::: --><center> <a href="http://www.eviews.com/blog/funcoef/quantplot.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/quantplot.jpeg" title="Local Averaging with Quantiles" width="720" /></a><br /> <small>Figure 5: Local Averaging with Quantiles</small><br /><br /></center><!-- :::::::::: FIGURE 5 :::::::::: --> Clearly, when the number of bins is 2, the estimate is a straight line and severely underfits the objective function. Nevertheless, as the number of bins increases, so does the accuracy of the estimate. Indeed, local estimation here is shown to be significantly more accurate than global estimation used earlier on the same function $m(x)$. This is of course a consequence of local averaging which performs piecemeal smoothing on only those observations restricted to each bin. Naturally, high leverage observations and outliers are better accommodated as they are averaged only with those observations in the immediate vicinity which also fall in the same bin. In fact, we can demonstrate this using the function $m(x)$ in \eqref{eq.1.6}.<br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --><center> <a href="http://www.eviews.com/blog/funcoef/quantplotoutliers.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/quantplotoutliers.jpeg" title="Local Averaging with Quantiles and Outliers" width="720" /></a><br /> <small>Figure 6: Local Averaging with Quantiles and Outliers</small><br /><br /></center><!-- :::::::::: FIGURE 6 :::::::::: --> Evidently, increasing the number of bins leads to increasingly better adaptation to the presence of outlying observations.<br /><br /> It's worth pointing out here that unlike sieve estimation which can suffer from infeasibility with increased sieve length, in local estimation, there is in principle no limit to how finely we wish to define the bin width. Nevertheless, as is evident from the visuals, while increasing the number of bins will reduce bias, it will also introduce variance. In other words, smoothness is sacrificed at the expense of accuracy. This is of course the <i>bias-variance tradeoff</i> and is precisely the mechanism by which fine-tuning the estimator is possible.<br /><br /> <h4 id="sec3.1">Localized Kernel Regression</h4> The idea of local averaging can be extended to accommodate various bin types and sizes. The most popular approaches leverage information of the points at which estimates of $m(x)$ are desired. For instance, if estimates of $m(x)$ are desired at a set of points $\left(x_{1}, \ldots, x_{J} \right)$, then the estimate $\widehat{m}(x_{j})$ can be the average of $Y_{t}$ for each point $X_{t}$ in some <i>neighborhood</i> of $x_{j}$ for $j=1,\ldots, J$. In other words, bins are defined as neighborhoods centered around the points $x_{j}$, with the size of the neighborhood determined by some distance metric. Then, to gain control over the bias-variance tradeoff, neighborhood size can be exploited with a penalization scheme. In particular, penalization introduces a weight function which disadvantages those $X_{t}$ that are too far from $x_{j}$ in any direction. In other words, those $X_{t}$ close to $x_{j}$ (in the neighborhood) are assigned larger weights, whereas those $X_{t}$ far from $x_{j}$ (outside the neighborhood) are weighed down.<br /><br /> Formally, when the function $m(\cdot)$ is univariate, local kernel estimators solve optimization problems of the form: \begin{align} \arg\!\min_{\hspace{-1em} \beta_{0}} E\left(Y_{t} - \beta_{0}\right)^{2}K_{h}\left(X_{t} - x_{j}\right) \quad \forall j \in \left\{1, \ldots, J\right\}\label{eq.1.7} \end{align} Here we use the traditional notation $K_{h}(X_{t} - x_{j}) \equiv K\left(\frac{|X_{t} - x_{j}|}{h}\right)$ where $K(\cdot)$ is a distributional weight function, otherwise known as a <i>kernel</i>, $|\cdot|$ denotes a distance metric (typically Euclidean), $h$ denotes the size of the local neighbourhood (bin), otherwise known as a <i>bandwidth</i>, and $\beta_{0} \equiv \beta_{0}(x_{j})$ due to its dependence on the evaluation point $x_{j}$.<br /><br /> To gain further insight, it is easiest to think of $K(\cdot)$ as a probability density function with support on $[-1,1]$. For instance, consider the famous <i>Epanechnikov</i> kernel: $$K(u) = \frac{3}{4}\left(1 - u^{2}\right) \quad \text{for} \quad |u| \leq 1$$ or the <i>cosine</i> kernel specified by: $$K(u) = \frac{\pi}{4}\cos(\frac{\pi}{2}u) \quad \text{for} \quad |u| \leq 1$$ <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 7A :::::::::: --> <center> <a href="http://www.eviews.com/blog/funcoef/epankern.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/epankern.jpeg" title="Epanechnikov Kernel" width="360" /></a><br /> </center> <!-- :::::::::: FIGURE 7A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 7B :::::::::: --> <center> <a href="http://www.eviews.com/blog/funcoef/coskern.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/coskern.jpeg" title="Cosine Kernel" width="360" /></a><br /> </center> <!-- :::::::::: FIGURE 7B :::::::::: --> </td> </tr> <tr> <td> <center> <small>Figure 7A: Epanechnikov Kernel</small><br /><br /> </center> </td> <td> <center> <small>Figure 7B: Cosine Kernel</small><br /><br /> </center> </td> </tr> </tbody></table> Now, if $|X_{t} - x| > h$, it is clear that $K(\cdot) = 0$. In other words, if the distance between $X_{t}$ and $x$ is larger than the bandwidth (neighborhood size), then $X_{t}$ lies outside the neighborhood and its importance will be weighed down to zero. Alternatively, if $|X_{t} - x| = 0$, then $X_{t} = x$ and $X_{t}$ will be assigned the highest weight, which in the case of the Epanechnikov and cosine kernels, is 0.75 and 0.8, respectively <br /><br /> To demonstrate the mechanics, consider a kernel estimator based on $k-$nearest neighbouring points, or the weighted $k-NN$ estimator. In particular, this estimator defines the neighbourhood as all points $X_{t}$, the distance of which to an evaluation point $x_{j}$, are no greater than the distance of the $k^{\text{th}}$ nearest point $X_{t}$ to the same evaluation point $x_{j}$. When used in the optimization problem \eqref{eq.1.7}, the resulting estimator is also sometimes referred to as <i>LOWESS</i> - LOcally Weighted Estimated Scatterplot Smoothing.<br /><br /> The algorithm used in the demonstration is relatively simple. First, define $k^{\star}$ as the number of neighbouring points to be considered and define a grid $\mathcal{X} \equiv \{x_{1}, \ldots, x_{J}\}$ of points at which an estimate of $m(\cdot)$ is desired. Next, define a kernel function $K(\cdot)$. Finally, for each $j \in \{1, \ldots, J\},$, execute the following: <ol> <li>For each $t \in \{1,\ldots, T\}$, compute $d_{t} = |X_{t} - x_{j}|$ -- the Euclidean distance between $X_{t}$ and $x_{j}$. <li>Order the $d_{t}$ in ascending order to form the ordered set $\{d_{(1)} \leq d_{(2)} \leq \ldots \leq d_{(T)}\}$. <li>Set the bandwidth as $h = d_{(k^{\star})}$. <li>For each $t \in \{1,\ldots, T\}$, compute a weight $w_{t} \equiv K_{h}(X_{t} - x_{j})$. <li>Solve the optimization problem: $$\arg\!\min_{\hspace{-1em} \beta_{0}} E\left(Y_{t} - \beta_{0}\right)^{2}w_{t}$$ to derive the parameter estimate: $$\widehat{m}_{\setminus t^{\star}}(x_{j}) \equiv \widehat{\beta}_{0}(x_{j}) = \frac{\sum_{t=1}^{T}w_{t}Y_{t}}{\sum_{t=1}^{T}w_{t}}$$ </ol> An estimate of $m(x)$ along the domain $\mathcal{X}$, is now the linear interpolation of the points $\{\widehat{\beta}_{0}(x_{1}), \ldots, \widehat{\beta}_{0}(x_{J})\}$.<br /><br /> For instance, suppose $m(x)$ is the curve defined in \eqref{eq.1.6}, the evaluation grid $\mathcal{X}$ consists of points in the interval $[-6,6]$, and $K(\cdot)$ is the Epanechnikov kernel. Furthermore, suppose $Y_{t} = m(x) + 0.5\epsilon_{t}$ and $X_{t} = x - 0.5 + \eta_{t}$. Notice that we're back to treating the regressor as a stochastic variable. Then, the $k-NN$ estimator of $m(\cdot)$ with 15, 40, 100, and 200 nearest neighbour points, respectively, is illustrated below.<br /><br /> <!-- :::::::::: FIGURE 8 :::::::::: --><center> <a href="http://www.eviews.com/blog/funcoef/knnreg.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/knnreg.jpeg" title="k-NN Regression" width="720" /></a><br /> <small>Figure 8: k-NN Regression</small><br /><br /></center><!-- :::::::::: FIGURE 8 :::::::::: --> Clearly, the estimator can be very adaptive to the nuances of outlying points but can suffer from both underfitting and overfitting. In this regard, observe that the number of neighbouring points is directly proportional to neighbourhood (bandwidth) size. In other words, as the number of neighbouring points increases, the bandwidth increases. This is evidenced by a very volatile estimator when the number of neighbouring points is 15, and a significantly smoother estimator when th number of neighbouring points is 200. Therefore, there must be some optimal middle ground between undersmoothing and oversmoothing. In general, notice that apart from the lower zero bound, the bandwidth is not bounded above. Thus, there is an extensive range of bandwidth possibilities. So how does one define what constitutes an optimal bandwidth?<br /><br /> <h4 id="sec3.2">Bandwidth Selection</h4> While we will cover optimal bandwidth selection in greater detail in Part II of this series, it is not difficult to draw similarities between the role of bandwidth size in local estimation and sieve length in global methods. In fact similar methods for optimal bandwidth selection exist in the context of local kernel regression, and analogous to sieve methods, are also typically grid searches. In this regard, in order to avoid complicated theoretical discourse, consider momentarily the optimization problem in \eqref{eq.1.7}.<br /><br /> It is not difficult to demonstrate that the estimator $\widehat{\beta}_{0}(x)$ satisfies: \begin{align*} \widehat{\beta}_{0}(x) &= \frac{T^{-1}\sum_{t=1}^{T}K_{h}\left(X_{t} - x\right)Y_{t}}{T^{-1}\sum_{t=1}^{T}K_{h}\left(X_{t} - x\right)}\\ &=\frac{1}{T}\sum_{t=1}^{T}\left(\frac{K_{h}\left(X_{t} - x\right)}{T^{-1}\sum_{i=1}^{T}K_{h}\left(X_{i} - x\right)}\right)Y_{t} \end{align*} Accordingly, if $h\rightarrow 0$, then $\frac{K_{h}\left(X_{t} - x\right)}{T^{-1}\sum_{i=1}^{T}K_{h}\left(X_{i} - x\right)} \rightarrow T$ and is only defined on $x = X_{t}$. In other words, as the bandwidth approaches zero, $\widehat{\beta}_{0}(x) \equiv \widehat{\beta}_{0}(X_{t}) \rightarrow Y_{t}$, and the estimator is effectively an interpolation of the data. Naturally, this estimator has very small bias since it picks up every data point in $Y_{t}$, but also has very large variance for the same reason.<br /><br /> Alternatively, should $h \rightarrow \infty$, then $\frac{K_{h}\left(X_{t} - x\right)}{T^{-1}\sum_{i=1}^{T}K_{h}\left(X_{i} - x\right)} \rightarrow 1$ for all values of $x$, and $\widehat{\beta}_{0}(x) \rightarrow T^{-1}\sum_{t=1}^{T}Y_{t}$. That is, $\widehat{\beta}_{0}(x)$ is a constant function equal to the mean of $Y_{t}$, and therefore has zero variance, but suffers from very large modelling bias since it picks up only those points equal to the average.<br /><br /> Between these two extremes is an entire spectrum of models $\left\{\mathcal{M}_{h} : h \in \left(0, \infty\right) \right\}$ ranging from the most complex $\mathcal{M}_{0}$, to the least complex $\mathcal{M}_{\infty}$. In other words, the bandwidth parameter $h$ governs model complexity. Thus, the optimal bandwidth selection problem selects an $h^{\star}$ to generate a model $\mathcal{M}_{h^{\star}}$ best suited for the data under consideration. In other words, it reduces to the classical bias-variance tradeoff.<br /><br /> To demonstrate certain principles, we close this section by returning to the leave-one-out cross-validation procedure discussed earlier. As a matter of fact, the algorithm also applies to local kernel regression and we do so in the context of $k-NN$ regression, also discussed earlier.<br /><br /> In particular, define a search grid $\mathcal{K} \equiv \{k_{min}, \ldots, k_{max}\}$ of the number of neighbouring points, select a kernel function $K(\cdot)$, and iterate the following steps over $k \in \mathcal{K}$: <ol> <li>For each observation $t^{\star} \in \left\{1, \ldots, T \right\}$: <ol type="i"> <li>For each $t \neq t^{\star} \in \{1,\ldots, T\}$, compute $d_{t \neq t^{\star}} = |X_{t} - X_{t^{\star}}|$. <li>Order the $d_{t \neq t^{\star}}$ in ascending order to form the ordered set $\{d_{t \neq t^{\star} (1)} \leq d_{t \neq t^{\star} (2)} \leq \ldots d_{t \neq t^{\star} (T-1)}\}$. <li>Set the bandwidth as $h_{\setminus t^{\star}} = d_{t \neq t^{\star} (k)}$. <li>For each $t \neq t^{\star} \in \{1,\ldots, T\}$ , compute a weight $w_{_{\setminus t^{\star}}t} \equiv K_{h_{\setminus t^{\star}}}(X_{t} - X_{t^{\star}})$. <li>Solve the optimization problem: $$\arg\!\min_{\hspace{-1em} \beta_{0}} E\left(Y_{t} - \beta_{0}\right)^{2}w_{_{\setminus t^{\star}}t}$$ to derive the parameter estimate: $$\widehat{m}_{k,\setminus t^{\star}}(X_{t^{\star}}) \equiv \widehat{\beta}_{_{k,\setminus t^{\star}}0}(X_{t^{\star}}) = \frac{\sum_{t\neq t^{\star}}^{T}w_{_{\setminus t^{\star}}t}Y_{t}}{\sum_{t\neq t^{\star}}^{T}w_{_{\setminus t^{\star}}t}}$$ where we use the subscript $k,\setminus t^{\star}$ to denote explicit dependence on the number of neighbouring points $k$ and the dropped observation $t^{\star}$. <li>Derive the forecast error for the dropped observation as follows: $$e_{_{k}t^{\star}} \equiv Y_{t^{\star}} - \widehat{m}_{k,\setminus t^{\star}}(X_{t^{\star}})$$ </ol> <li>Derive the cross-validation mean squared error when using $k$ nearest neighbouring points : $$MSE_{k} = \frac{1}{T}\sum_{t=1}^{T} e_{_{k}t}^{2}$$ <li>Determine the optimal number of neighbouring points $k^{\star}$ as the minimum $MSE_{k}$ across $\mathcal{K}$. In other words $$k^{\star} = \min_{k\in\mathcal{K}} MSE_{k}$$ </ol> We close this section and blog entry with an illustration of the procedure. In particular, we again consider the function in \eqref{eq.1.6}, and use the cosine kernel to search for the optimal number of neighbouring points over the search grid $\mathcal{K} \equiv \{40, \ldots, 80\}$.<br /><br /> <!-- :::::::::: FIGURE 9 :::::::::: --><center> <a href="http://www.eviews.com/blog/funcoef/knnregopt.jpeg"><img height="auto" src="http://www.eviews.com/blog/funcoef/knnregopt.jpeg" title="k-NN Regression with Optimal k" width="360" /></a><br /> <small>Figure 9: k-NN Regression with Optimized k</small><br /><br /></center><!-- :::::::::: FIGURE 9 :::::::::: --> <h3 id="sec4">Conclusion</h3> Given the recent introduction of functional coefficient estimation in EViews 11, our aim in this multi-part blog series is to complement this feature release with a theoretical and practical overview. As first step in this regard, we've dedicated this Part I of the series to gently introducing readers to the principles of nonparametric estimation, and illustrated them using EViews programs. In particular, we've covered principles of sieve and kernel estimation, as well as optimal sieve length and bandwidth selection. In Part II, we'll extend the principles discussed here and cover the theory underlying functional coefficient estimation in greater detail.<br /><br /> <h3 id="sec5">Files</h3>The workfile and program files can be downloaded here.<br /><br /> <ul> <li> <a href="http://www.eviews.com/blog/funcoef/sievereg.prg">sievereg.prg</a> <li> <a href="http://www.eviews.com/blog/funcoef/locavg.prg">locavg.prg</a> <li> <a href="http://www.eviews.com/blog/funcoef/knnreg.prg">knnreg.prg</a></ul><br /><br /> <hr /><h3 id="sec6">References</h3> <ol class="bib2xhtml"> <!-- Authors: Craven Peter and Wahba Grace --><li><a name="craven-1979"></a>Peter Craven and Grace Wahba. Estimating the correct degree of smoothing by the method of generalized cross-validation. <cite>Numerische Mathematik</cite>, 31:377&#x2013;403, 1979.</li> <!-- Authors: Grenander Ulf --><li><a name="grenander-1981"></a>Ulf Grenander. Abstract inference. Technical report, 1981.</li> <!-- Authors: Li Ker Chau and others --><li><a name="li-1987"></a>Ker-Chau Li and others. Asymptotic optimality for csub>p</sub>, csub>l</sub> , cross-validation and generalized cross-validation: Discrete index set. <cite>The Annals of Statistics</cite>, 15(3):958&#x2013;975, 1987.</li> <!-- Authors: Mallows Colin L --><li><a name="mallows-1973"></a>Colin&nbsp;L Mallows. Some comments on c p. <cite>Technometrics</cite>, 15(4):661&#x2013;675, 1973.</li> <!-- Authors: Stone Mervyn --><li><a name="stone-1974"></a>Mervyn Stone. Cross-validation and multinomial prediction. <cite>Biometrika</cite>, 61(3):509&#x2013;515, 1974.</li> </ol></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-69999115724147797532019-04-23T15:09:00.001-07:002019-04-26T07:16:56.719-07:00Generalized Autoregressive Score (GAS) Models: EViews Plays with Python<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; sans-serif"> Starting with EViews 11, users can take advantage of communication between EViews and Python. This means that workflow can begin in EViews, switch over to Python, and be brought back into EViews seamlessly. To demonstrate this feature, we will use U.S. macroeconomic data on the unemployment rate to fit a GARCH model in EViews, transfer the data over and estimate a GAS model equivalent of the GARCH model in Python, transfer the data back to EViews, and compare the results.<br /><br /><a name='more'></a> <h3>Table of Contents</h3><ol> <li><a href="#sec1">GAS Models</a> <li><a href="#sec2">Example Description</a> <li><a href="#sec3">Preparatory Work</a> <li><a href="#sec4">Data Analysis in EViews</a> <li><a href="#sec5">Data Analysis in Python</a> <li><a href="#sec6">Back to EViews</a> <li><a href="#sec7">Files</a> <li><a href="#sec8">References</a></ol><br /> <h3 id="sec1">GAS Models</h3> Historically, time varying parameters have received an enormous amount of attention and the literature is saturated with numerous specifications and estimation techniques. Nevertheless, many of these specifications are often difficult to estimate, such as the family of stochastic volatility models, among which GARCH is a canonical example. In this regard, Creal, Koopman, and Lucas (2013) and Harvey (2013) proposed a novel family of time-varying parametric models estimated using the familiar maximum likelihood framework with the score of the conditional density function driving the updating mechanism. The family has now come to be known as the <b>generalized autoregressive score</b> (GAS) family or model.<br /><br /> GAS models are agnostic as to the type of data under consideration as long as the score function and the Hessian are well defined. In particular, the model assumes an input vector of random variables at time $t$, say $\pmb{y}_{t} \in \mathbf{R}^{q}$, where $q=1$ if the setting is univariate. Furthermore, the model assumes a conditional distribution at time $t$ specified as: $$\pmb{y}_{t} | \pmb{y}_{1}, \ldots, \pmb{y}_{t-1} \sim p(\pmb{y}_{t}; \pmb{\theta}_{t})$$ where $\pmb{\theta}_{t} \equiv \pmb{\theta}_{t} (\pmb{y}_{1}, \ldots, \pmb{y}_{t-1}, \pmb{\xi}) \in \Theta \subset \mathbf{R}^{r}$ is a vector of time varying parameters which fully characterize $p(\cdot)$ and are functions of past data and possibly time invariant parameters $\pmb{\xi}$.<br /><br /> What distinguishes GAS models from the rest of the literature is that dynamics in $\pmb{\theta}_{t}$ are driven by an autoregressive mechanism augmented with the score of the conditional distribution of $p(\cdot)$. In particular, $$\pmb{\theta}_{t+1} = \pmb{\omega} + \pmb{A}\pmb{s}_{t} + \pmb{B}\pmb{\theta}_{t}$$ where $\pmb{\omega}, \pmb{A},$ and $\pmb{B}$ are matrix coefficients collected in $\pmb{\xi}$, and $\pmb{s}_{t}$ is a vector proportional to the score of $p(\cdot)$: $$\pmb{s}_{t} = \pmb{S}_{t}(\pmb{\theta}_{t}) \pmb{\nabla}_{t}(\pmb{y}_{t}, \pmb{\theta}_{t})$$ Above, $\pmb{S}_{t}$ is an $r\times r$ positive definite scaling matrix known at time $t$, and $$\pmb{\nabla}_{t}(\pmb{y}_{t}, \pmb{\theta}_{t}) \equiv \frac{\partial \log p(\pmb{y}_{t}; \pmb{\theta}_{t})}{\partial \pmb{\theta}_{t}}$$ It turns out that different choices of $\pmb{S}_{t}$ produce different GAS models. For instance, setting $\pmb{S}_{t}$ to some power $\gamma > 0$ of the information matrix of $\pmb{\theta}_{t}$ will change how the variance of $\pmb{\nabla}_{t}$ impacts the model. In particular, consider: $$\pmb{S}_{t} = \pmb{\mathcal{I}}_{t}(\pmb{\theta}_{t})^{-\gamma}$$ where $$\pmb{\mathcal{I}}_{t}(\pmb{\theta}_{t}) = E_{t-1}\left\{ \pmb{\nabla}_{t}(\pmb{y}_{t}, \pmb{\theta}_{t}) \pmb{\nabla}_{t}(\pmb{y}_{t}, \pmb{\theta}_{t})^{\top} \right\}$$ Typical choices for $\gamma$ are 0, 1/2, and 1. For instance, if $\gamma=0$, $\pmb{S}_{t} = \pmb{I}$ and no scaling occurs. Alternatively, when $\gamma = 1/2$, the scaling results in $Var_{t-1}(\pmb{s}_{t}) = \pmb{I}$; in other words, standardization occurs.<br /><br /> Regardless of the choice of $\gamma$, $\pmb{s}_{t}$ is a martingale difference with respect to the distribution $p(\cdot)$, and $E_{t-1}\left\{ \pmb{s}_{t} \right\} = 0$ for all $t$. This latter property further implies that $\pmb{\theta}_{t}$ is in fact a stationary process with long-term mean value $(\pmb{I}_{t} - \pmb{B})^{-1}\pmb{\omega}$, whenever the spectral radius of $\pmb{B}$ is less than one. Thus, $\pmb{\omega}$ and $\pmb{B}$ are respectively responsible for controlling the level and the persistence of $\pmb{\theta}_{t}$, whereas $\pmb{A}$ controls for the impact of $\pmb{s}_{t}$. In other words, $\pmb{s}_{t}$ denotes the direction of updating $\pmb{\theta}_{t}$ to $\pmb{\theta}_{t+1}$, acting as the the steepest ascent algorithm for improving the model's local fit.<br /><br /> With the above frameowrk established, Creal, Koopman, and Lucas (2013) show that various choices for $p(\cdot)$ and $\pmb{S}_{t}$ lead to various GAS specifications, some of which reduce to very familiar and well established existing models. For instance, let $y_{t} = \sigma_{t}\epsilon_{t}$, and suppose $\epsilon_{t}$ is a Gaussian random variable with mean zero and unit variance. It is readily shown that setting $S_{t} = \mathcal{I}_{t}^{-1}$ and $\theta_{t} = \sigma_{t}^{2}$, the GAS updating equation reduces to: $$\theta_{t+1} = \omega + A(y_{t}^{2} - \theta_{t}) + B\theta_{t}$$ which is equivalent to the standard GARCH(1,1) model $$\sigma_{t+1}^{2} = \alpha + \beta y_{t}^{2} + \eta \sigma_{t}^{2}$$ where $\alpha = \omega$, $\beta = A$, and $\eta = B - A$. There is of course a number of other examples and configurations, and we refer the reader to the original texts for more details.<br /><br /> <h3 id="sec2">Example Description</h3> Our objective here is to communicate between EViews and Python to estimate a GAS model in Python and compare the results back in EViews. In particular, we will work with U.S. monthly civil unemployment rate, defined as the number of unemployed as a percentage of the labor force -- <i>Labor force data are restricted to people 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces.</i> See the FRED database at <a href="https://fred.stlouisfed.org/series/UNRATE">https://fred.stlouisfed.org/series/UNRATE</a>) -- to which we will fit a GARCH(1,1) model using the traditional method as well as the GAS approach.<br /><br /> It is well known that unemployment rates are typically very volatile and persistent, particularly in contractionary economic cycles. This is because major firm decisions, such as workforce expansions and contractions, are often accompanied by large sunk costs (e.g. job advertisements, screening, training), and are usually irreversible in the immediate short term (e.g. wage frictions such as labour contracts and dismissal costs). Thus, in contractionary periods, firms typically prefer to defer hiring decisions until more favourable conditions return, resulting in strong unemployment persistence known as <i>spells</i>. On the other hands, these periods are often characterized by frequent labour force transitions and increased search activities, both of which contribute to unemployment volatility.<br /><br /> In light of the above, measuring the volatility of unemployment requires the use of econometric models which are designed to capture both volatility and persistence. While several such models exist in the literature, here we focus on perhaps the most well known such model proposed by Engle (1982) and Bollerslev (1986), the generalized autoregressive conditional heteroskedasticity (GARCH) model described earlier. In particular, if we let $y_{t}$ denote the monthly unemployment rate, we are interested in obtaining an estimate $\widehat{\sigma}_{t}$ of $\sigma_{t}$, at each point in time, effectively tracing the evolution of unemployment volatility for the period under consideration. Since the GAS model above reduces to the GARCH model when the conditional distribution $p(\cdot)$ is Gaussian and the time varying parameter is the volatility of the process, we would like to compare the estimates from the GAS model to those generated by EViews' internal GARCH estimation. Note here that while EViews can estimate numerous (G)ARCH models, it cannot yet natively estimate GAS models. Accordingly, we will fit a GARCH model in EViews, transfer our data over to Python, and estimate a GAS model using the Python package <b>PyFlux</b>. We will then compare our findings.<br /><br /> <h3 id="sec3">Preparatory Work</h3> Before getting started, please make sure that you have Python 3 installed from <a href="https://www.python.org/downloads/release/python-368/">https://www.python.org/downloads/release/python-368/</a> on your system, and that you also have the following Python packages installed: <ol> <li>NumPy <li>Pandas <li>Matplotlib <li>Seaborn <li>PyFlux </ol> One (certainly not the only) way to install said packages, is to open up a command prompt on your system and navigate to the directory where Python was installed; this is usually <code>C:\Users\USER_NAME\AppData\Local\Programs\Python\Python36_64</code> if you have a 64-bit version. From there, issue the following commands: <pre><br /> python -m pip install --upgrade pip<br /> python -m pip install PACKAGE_NAME<br /></pre> Next, make sure that the path to Python is specified in your EViews options. Specifically, in EViews, go to <b>Options/General Options...</b> and on the left tree select <b>External program interface</b> and ensure that <b>Home Path</b> is correctly pointing to the directory where Python is installed. Usually, you will not have to touch this setting since EViews populates this field by searching your system for the install directory.<br /><br /> Finally, please note that as of writing, the analysis that follows was tested with Python version 3.6.8 and PyFlux version 0.4.15.<br /><br /> <h3 id="sec4">Data Analysis in EViews</h3> Turning to data analysis, in EViews, create a new monthly workfile. To do so, click on <b>File/New/Workfile</b>. Under <b>Frequency</b> select <b>Monthly</b>, and set the <b>Start date</b> to <b>2006M12</b> and the <b>End date</b> to <b>2013M12</b>, and hit <b>OK</b>. Next, fetch the unemployment rate data from the FRED database by clicking on <b>File/Open/Database...</b>. From here, select <b>FRED Database</b> from the <b>Database/File Type</b> dropdown, and hit <b>OK</b>. This opens the FRED database window. To get the series of interest from here, click on the <b>Browse</b> button. This opens a new window with a folder-like overview. Here, click on <b>All Series Search</b> and then type <b>UNRATE</b> in the <b>Search For</b> textbox. This will list a series called <i>Civilian Unemployment Rate (M,SA,%)</i>. Drag the series over to the workfile to make it available for analysis. This will fetch the series <b>UNRATE</b> from the FRED database and place it in the workfile. In particular, we are grabbing data from the period of December 2006 to December 2013 -- effectively the recessionary period characterized by the recent housing loan crisis in the United States. <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 1A :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-Q-8R_IidAy4/XL9_4pjYlVI/AAAAAAAAAwE/fIApBqd5JaM6BUKSTbtyWoVebZ9H3o-6gCLcBGAs/s1600/workfiledlg.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-Q-8R_IidAy4/XL9_4pjYlVI/AAAAAAAAAwE/fIApBqd5JaM6BUKSTbtyWoVebZ9H3o-6gCLcBGAs/s1600/workfiledlg.jpg" title="Workfile Dialog" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 1A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 1B :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-l_NxAegKlPA/XL9_1x9o3jI/AAAAAAAAAvk/Ti-KspaNYvcxFHOTFnf01N-fAQwYGr5kwCLcBGAs/s1600/dbasedlg.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-l_NxAegKlPA/XL9_1x9o3jI/AAAAAAAAAvk/Ti-KspaNYvcxFHOTFnf01N-fAQwYGr5kwCLcBGAs/s1600/dbasedlg.jpg" title="Database Dialog" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 1B :::::::::: --> </td> </tr> <tr> <td> <center> <small>Figure 1A: Workfile Dialog</small><br /><br /> </center> </td> <td> <center> <small>Figure 1B: Database Dialog</small><br /><br /> </center> </td> </tr> <tr> <td> <!-- :::::::::: FIGURE 1C :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-ZXhI23HsbNg/XL-dET3-9WI/AAAAAAAAAxQ/VO7Ei3YNsZo323-Lki9uF8X9gQomoK8-gCLcBGAs/s1600/fredqry.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-ZXhI23HsbNg/XL-dET3-9WI/AAAAAAAAAxQ/VO7Ei3YNsZo323-Lki9uF8X9gQomoK8-gCLcBGAs/s1600/fredqry.jpg" title="FRED Browse" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 1C :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 1D :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-Oe-7rPF-3B4/XL-dBX3UPMI/AAAAAAAAAxM/CbHRLkPXNZUlu8Bk3NDpY_XJXJFh-x4IwCLcBGAs/s1600/fredqry2.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-Oe-7rPF-3B4/XL-dBX3UPMI/AAAAAAAAAxM/CbHRLkPXNZUlu8Bk3NDpY_XJXJFh-x4IwCLcBGAs/s1600/fredqry2.jpg" title="FRED Search" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 1D :::::::::: --> </td> </tr> <tr> <td> <center> <small>Figure 1C: FRED Browse</small><br /><br /> </center> </td> <td> <center> <small>Figure 1C: FRED Search</small><br /><br /> </center> </td> </tr> </tbody></table> Also, restrict the sample to the period from January 2007 to December 2013. Why we do this will become apparent later. To do so, issue the following command in EViews: <pre><br /> smpl 2007M01 @last<br /></pre> To see what the data looks like, double click on a <b>UNRATE</b> in the workfile to open the series object. Next, click on <b>View/Graph...</b>. This will open a graph options window. We will stick with the defaults so click on <b>OK</b>. The output is reproduced below.<br /><br /> <!-- :::::::::: FIGURE 2 :::::::::: --><center> <a href="https://lh3.googleusercontent.com/-uHw8WuakTR4/XL9_5hRhC0I/AAAAAAAAAwI/9fNMnaORB0s1VtiXiNExZk2F1lhJtaTogCLcBGAs/s1600/unrategrph.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-uHw8WuakTR4/XL9_5hRhC0I/AAAAAAAAAwI/9fNMnaORB0s1VtiXiNExZk2F1lhJtaTogCLcBGAs/s1600/unrategrph.jpg" title="Time Series Plot of UNRATE" width="320" /></a><br /> <small>Figure 2: Time Series Plot of UNRATE</small><br /><br /></center><!-- :::::::::: FIGURE 2 :::::::::: --> We will now estimate a basic GARCH model on <b>UNRATE</b>. To do this, click on <b>Quick/Estimate Equation...</b>, and under <b>Method</b> choose <b>ARCH - Autoregressive Conditional Heteroskedasticity</b>. In the <b>Mean Equation</b> text box type <b>UNRATE</b> and leave everything else as their default values. Click on <b>OK</b>. <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 3A :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-hhVBxWIvrWQ/XL9_2nSKhHI/AAAAAAAAAvw/4XEXNVqQYKoSJsz-o07VYotr_chidFdBwCLcBGAs/s1600/garchdlg.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-hhVBxWIvrWQ/XL9_2nSKhHI/AAAAAAAAAvw/4XEXNVqQYKoSJsz-o07VYotr_chidFdBwCLcBGAs/s1600/garchdlg.jpg" title="GARCH Estimation Dialog" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 3A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 3B :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-qOWoCw0lzlY/XL9_3H7NTOI/AAAAAAAAAv8/s8OAJ2cGWBg2xYkfgrFI9QA9KHsQ51A6ACLcBGAs/s1600/garchoutput.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-qOWoCw0lzlY/XL9_3H7NTOI/AAAAAAAAAv8/s8OAJ2cGWBg2xYkfgrFI9QA9KHsQ51A6ACLcBGAs/s1600/garchoutput.jpg" title="GARCH Estimation Output" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 3B :::::::::: --> </td> </tr> <tr> <td> <center> <small>Figure 3A: GARCH Estimation Dialog</small><br /><br /> </center> </td> <td> <center> <small>Figure 3B: GARCH Estimation Output</small><br /><br /> </center> </td> </tr> </tbody></table> From the estimation output we can see that model parameters have the following estimates: <ol> <li>$\alpha = 1.068302$ <li>$\beta = 1.236277$ <li>$\eta = -0.247753$ </ol> We can also see the path of the volatility process by clicking on <b>View/Garch Graph/Conditional Variance</b>. This produces a plot of $\widehat{\sigma}^{2}_{t}$. In fact, we will also create a series object from the data points used to produce the GARCH conditional variance. To do this, from the GARCH conditional variance window, click on <b>Proc/Make GARCH Variance Series...</b> and in the <b>Conditional Variance</b> textbox enter <b>EVGARCH</b> and hit <b>OK</b>. This produces a series object called <b>EVGARCH</b> and places it in the workfile. We will use it a bit later.<br /><br /> <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 4A :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-8DKMKkkml08/XL9_2yebDjI/AAAAAAAAAv0/JcgxOH_kCo0jlbZwBZS8CwfZsx_7wQguQCLcBGAs/s1600/garchcondvar.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-8DKMKkkml08/XL9_2yebDjI/AAAAAAAAAv0/JcgxOH_kCo0jlbZwBZS8CwfZsx_7wQguQCLcBGAs/s1600/garchcondvar.jpg" title="GARCH Conditional Variance of UNRATE" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 4A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 4B :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-MXO4qlsX3nY/XL9_2ixTAAI/AAAAAAAAAvs/9dCFnnP_0EEG6AA0QbFfG1ecA_BsNhiYgCLcBGAs/s1600/garchcondvardlg.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-MXO4qlsX3nY/XL9_2ixTAAI/AAAAAAAAAvs/9dCFnnP_0EEG6AA0QbFfG1ecA_BsNhiYgCLcBGAs/s1600/garchcondvardlg.jpg" title="GARCH Conditional Variance of Proc" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 4B :::::::::: --> </td> </tr> <tr> <td> <center> <small>Figure 4A: GARCH Conditional Variance of UNRATE</small><br /><br /> </center> </td> <td> <center> <small>Figure 4B: GARCH Conditional Variance Proc</small><br /><br /> </center> </td> </tr> </tbody></table> <h3 id="sec5">Data Analysis in Python</h3> To estimate the GAS equivalent of this model we must first transfer our data over to Python. To do so, issue the following command in EViews: <pre><br /> xopen(p)<br /></pre> This tells EViews to open an instance of Python within EViews and open up bi-directional communication. In fact you should see a new command window appear, titled <b>Log: Python Output</b>. Here you can issue commands into Python directly as if you had opened a Python instance at any command prompt. You can also send commands to Python using EViews command prompt. In fact, we will use the latter approach to import packages into our Python instance as follows: <pre><br /> xrun "import numpy as np"<br /> xrun "import pandas as pd"<br /> xrun "import pyflux as pf"<br /> xrun "import matplotlib.pyplot as plt"<br /></pre> For instance, the first command above tells eviews to issue the command <i>import numpy as np</i> in the open Python instance, thereby importing the NumPy package. In fact, all results will be echoed in the Python instance.<br /><br /> <!-- :::::::::: FIGURE 5 :::::::::: --><center> <a href="https://lh3.googleusercontent.com/-GCNeE9hdtPk/XL-dH7SByDI/AAAAAAAAAxU/R2vsxQiCvucN9mvTsxJcr2Gys_PH2YNvACLcBGAs/s1600/pythondlg.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-GCNeE9hdtPk/XL-dH7SByDI/AAAAAAAAAxU/R2vsxQiCvucN9mvTsxJcr2Gys_PH2YNvACLcBGAs/s1600/pythondlg.jpg" title="Python Output Log" width="320" /></a><br /> <small>Figure 5: Python Output Log</small><br /><br /></center><!-- :::::::::: FIGURE 5 :::::::::: --> Next, transfer the <b>UNRATE</b> series over to Python by issuing the following command in EViews: <pre><br /> xput(ptype=dataframe) unrate<br /></pre> The command above sends the series <b>UNRATE</b> to Python and transforms that data into a Pandas DataFrame object.<br /><br /> We now follow the PyFlux documentation and estimate the GAS model by issuing the following commands from EViews: <pre><br /> xrun "model = pf.GAS(ar=1, sc=1, data=unrate, family=pf.Normal())"<br /> xrun "fit = model.fit('MLE')"<br /> xrun "fit.summary()"<br /></pre> The first command above tells PyFlux to create a GAS model object that has one autoregressive and one scaling parameter, sets $p(\cdot)$ to the Gaussian distribution, and uses the series <b>UNRATE</b> as $y_{t}$. In other words, the autoregressive and scaling parameters respectively corresponds to the coefficients $A$ and $B$ in the first section of this document. The second command tells Python to create a variable <b>FIT</b> which will hold the output from an estimated GAS model which uses maximum likelihood as the estimation technique. We display the output of this estimation by invoking the third command. In particular, we have the following estimates: <ol> <li>$\omega = 0.0027$ <li>$A = 1.2973$ <li>$B = 0.9994$ </ol> In fact, we can also obtain a distributional plot of the autoregressive coefficient $B$ across the period of estimation. To do this, invoke the following command within EViews: <pre><br /> xrun "model.plot_z(, figsize=(15,5))"<br /></pre> The latter command tells Python to plot the distribution of the 2nd estimated coefficient (the AR coefficient) and to display a figure which is of size $15\times 5$ inches. This is the distribution of the evolution of $B$ and is <b>not</b> the time path of the estimated coefficient. <br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --><center> <a href="https://lh3.googleusercontent.com/-jDmWcghD7Ac/XL9_3PK7ppI/AAAAAAAAAv4/qsfrDrSvkG4Ho9epb73PaOgKz3kCI4HrQCLcBGAs/s1600/pyar1.png"><img height="auto" src="https://lh3.googleusercontent.com/-jDmWcghD7Ac/XL9_3PK7ppI/AAAAAAAAAv4/qsfrDrSvkG4Ho9epb73PaOgKz3kCI4HrQCLcBGAs/s1600/pyar1.png" title="Python GAS Distribution of AR Parameter" width="320" /></a><br /> <small>Figure 6: Python GAS Distribution of AR Parameter</small><br /><br /></center><!-- :::::::::: FIGURE 6 :::::::::: --> While we can obtain a distribution of the estimated parameters, unfortunately, PyFlux does not offer a way to extract the time path as a Python data object. Thankfully, we can recreate it manually and easily as a series in EViews.<br /><br /> <h3 id="sec6">Back To EViews</h3> To create the time path of the estimated GAS coefficient, we first need to transfer the coefficients from the estimated GAS model back into EViews. To do this, we invoke the following command in EViews: <pre><br /> xget(name=gascoefs, type=vector) fit.results.x[0:3]<br /></pre> This tells Python to send the first three estimated coefficients back to EViews, and saves the result as a vector called <b>GASCOEFS</b>.<br /><br /> Next, create a new series in the workfile called <b>GASGARCH</b> by issuing the following command in the EViews: <pre><br /> series gasgarch<br /></pre> Also, since this is an autoregressive process, we need to set an initial value for <b>GASGARCH</b>. We do this by setting the December 2006 observation to 0.7 -- the default value EViews uses to initialize its internal GARCH estimation. We do this by typing the following commands in EViews: <pre><br /> smpl 2006M12 2006M12<br /> gasgarch = 0.7<br /></pre> Next, we set the sample back to the period of interest and fill the values of <b>GASGARCH</b> using the GARCH formula with the coefficients from the GAS model. To do this, issue the following commands in EViews again: <pre><br /> smpl 2007M01 @last<br /> gasgarch = gascoefs(1) + gascoefs(3)*(unrate(-1)^2 - gasgarch(-1)) + gascoefs(2)*gasgarch(-1)<br /></pre> At last, we plot the GARCH conditional variance path from the internal estimation, <b>EVGARCH</b> along with the newly created series <b>GASGARCH</b>. We can do this programatically by issuing the following commands in EViews: <pre><br /> plot evgarch gasgarch<br /></pre> <!-- :::::::::: FIGURE 7 :::::::::: --><center> <a href="https://lh3.googleusercontent.com/-4S2DBUuONKQ/XL-DnGwwZSI/AAAAAAAAAw0/eDTMCzULzPI4fnjOvqIxLQ2ZojzxB5i5QCLcBGAs/s1600/garchgascompare.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-4S2DBUuONKQ/XL-DnGwwZSI/AAAAAAAAAw0/eDTMCzULzPI4fnjOvqIxLQ2ZojzxB5i5QCLcBGAs/s1600/garchgascompare.jpg" title="GARCH Conditional Variance Comparison with GAS" width="320" /></a><br /> <small>Figure 7: GARCH Conditional Variance Comparison with GAS</small><br /><br /></center><!-- :::::::::: FIGURE 7 :::::::::: --> It is clear that the two estimation techniques produce the same path despite having different estimates for the coefficients. At last, note that while GARCH models are estimated using maximum likelihood procedures, parameter estimates are typically numerically unstable and often fail to converge. This often requires a re-specification of the convergence criterion and / or a change in starting values. These drawbacks are also an issue with GAS models.<br /><br /> <h3 id="sec7">Files</h3>The workfile and program files can be downloaded here.<br /><br /> <ul> <li> <a href="http://www.eviews.com/blog/pygas/pygas.WF1">seasuroot.WF1</a> <li> <a href="http://www.eviews.com/blog/pygas/pygas.prg">seasuroot.prg</a> </ul><br /><br /> <hr /><h3 id="sec8">References</h3> <table> <tr valign="top"> <td align="right" class="bibtexnumber"> <a name="bollerslev-1986">1</a> </td> <td class="bibtexitem"> Tim Bollerslev. Generalized autoregressive conditional heteroskedasticity. <em>Journal of econometrics</em>, 31(3):307--327, 1986. [&nbsp;<a href="references_bib.html#bollerslev-1986">bib</a>&nbsp;] </td> </tr> <tr valign="top"> <td align="right" class="bibtexnumber"> <a name="creal-2013">2</a> </td> <td class="bibtexitem"> Drew Creal, Siem&nbsp;Jan Koopman, and Andr&eacute; Lucas. Generalized autoregressive score models with applications. <em>Journal of Applied Econometrics</em>, 28(5):777--795, 2013. [&nbsp;<a href="references_bib.html#creal-2013">bib</a>&nbsp;] </td> </tr> <tr valign="top"> <td align="right" class="bibtexnumber"> <a name="engle-1982">3</a> </td> <td class="bibtexitem"> Robert&nbsp;F Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. <em>Econometrica: Journal of the Econometric Society</em>, pages 987--1007, 1982. [&nbsp;<a href="references_bib.html#engle-1982">bib</a>&nbsp;] </td> </tr> <tr valign="top"> <td align="right" class="bibtexnumber"> <a name="harvey-2013">4</a> </td> <td class="bibtexitem"> Andrew&nbsp;C Harvey. <em>Dynamic models for volatility and heavy tails: with applications to financial and economic time series</em>, volume&nbsp;52. Cambridge University Press, 2013. [&nbsp;<a href="references_bib.html#harvey-2013">bib</a>&nbsp;] </td> </tr> </table> </span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-86737930382696770012019-04-23T07:48:00.000-07:002019-04-23T07:48:17.503-07:00Seasonal Unit Root Tests<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; sans-serif"> <i>Author and guest post by Nicolas Ronderos</i><br /><br /> In this blog entry we will offer a brief discussion on some aspects of seasonal non-stationarity and discuss two popular seasonal unit root tests. In particular, we will cover the Hylleberg, Engle, Granger, and Yoo (1990) and Canova and Hansen (1995) tests and demonstrate practically using EViews how the latter can be used to detect the presence of seasonal unit roots in a US macroeconomic time series. All files used in this exercise can be downloaded at the end of the entry.<br /><br /><a name='more'></a> <h3>Deterministic vs Stochastic Seasonality</h3> When we talk about the concept of seasonality in time series, we usually refer to the idea of <i>"... systematic, although not necessarily regular, intra-year movement caused by changes of the weather, the calendar, and timing of decisions..."</i> (Hans Franses). Naturally, macroeconomic data observed with high periodicity (sampled more than once a year) usually exhibit this behavior.<br /><br /> Seasonality can be modelled in two ways: deterministically or stochastically. The former arises form systematic cycles such as calendar effects or climatic phenomena and can be removed from data by the seasonal adjustment procedures -- in other words, by including seasonal dummy variables. Formally, this implies deterministic seasonality evolves as:<br /><br /> $$y_{t} = \mu + \sum_{s=1}^{S-1}\delta_{s}D_{s,t} + e_{t}$$ where $S$ is the total number of period cycles, $D_{s,t}$ are seasonal dummy variables which equal 1 in season $s$ and 0 otherwise, and $e_{t}$ are the usual innovations. For example, in the case of quarterly data $(S=4)$, one could postulate that seasonality evolves as:<br /><br /> $$y_{t} = 15 - D_{1,t} - 4D_{2,t} - 6D_{3,t} + e_{t}$$ The process is visualized below:<br /><br /> <!-- :::::::::: FIGURE 1 :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-Rcyh4rXs9xk/XL3zX9LeHbI/AAAAAAAAAtc/TNaYbumDBwko5GG2503X6x6NuwUJZLHSQCEwYBhgL/s1600/ds.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-Rcyh4rXs9xk/XL3zX9LeHbI/AAAAAAAAAtc/TNaYbumDBwko5GG2503X6x6NuwUJZLHSQCEwYBhgL/s1600/ds.jpg" title="Deterministic Seasonality" width="320" /></a><br /> <small>Figure 1: Deterministic Seasonality</small><br /><br /> </center> Notice here that the optimal $h$-period ahead forecast of $y_{t}$ in season $s$, is given by:<br /><br /> $$\widehat{y}_{S(t+h)-s} = \widehat{\mu} + \widehat{\delta}_{s}$$ where $s = S-1, \ldots, 0$. In other words, the optimal forecast of $y_{t}$ in season $s$ is the same at each future point in time for said season. It is precisely this property which formalizes the notion of systematic cyclicality.<br /><br /> On the other hand, stochastic seasonality describes nearly systematic cycles which evolve as seasonal ARMA$(p,q)$ processes of the form:<br /><br /> $$(1 - \eta_{1}L^{S} - \eta_{2}L^{2S} - \ldots - \eta_{p}L^{pS})y_{t} = (1 + \xi_{1}L^{S} + \xi_{2}L^{2S} + \ldots + \xi_{q}L^{qS})e_{t}$$ where $L$ denotes the usual lag operator. In particular, when $p = 1$ and $q = 0$, the seasonal AR(1) model with $\eta_{1} = 0.75$ is visualized as follows:<br /><br /> <!-- :::::::::: FIGURE 2 :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-frISG5yc5Vs/XL3z2QMIM6I/AAAAAAAAAtk/VrmiJrXJp6oFRUAZpTcJBMk-F1dr7ilIACEwYBhgL/s1600/ss.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-frISG5yc5Vs/XL3z2QMIM6I/AAAAAAAAAtk/VrmiJrXJp6oFRUAZpTcJBMk-F1dr7ilIACEwYBhgL/s1600/ss.jpg" title="Stochastic Seasonality" width="320" /></a><br /> <small>Figure 2: Stochastic Seasonality</small><br /><br /> </center> Unlike the deterministic seasonal model however, the $h$-period ahead forecast of the stochastic seasonal model is not constant. In particular, for the seasonal AR(1) model, the forecast $h$-periods ahead is given by:<br /><br /> $$\widehat{y}_{S(t+h)-s} = \widehat{\eta}_{1}^{h}y_{St-s}$$ In other words, the forecast in any given season is a function of past data values, and is therefore considered to be <i>stochastic</i>.<br /><br /> So how does one identify whether a series exhibits deterministic or stationary seasonality? One useful tool is the <i>periodogram</i> which produces a decomposition of the dominant frequencies (cycles) of a time series. As it turns out, there are at most $S$ frequencies in a time series exhibiting $S$ period cycles. Formally, these are identified in conjugate pairs as follows:<br /><br /> $$\omega \in \left\{0, \left(\frac{2\pi}{S}, 2\pi-\frac{2\pi}{S}\right), \left(\frac{4\pi}{S}, 2\pi-\frac{4\pi}{S}\right), \ldots, \pi \right\}$$ if $S$ is even, and<br /><br /> $$\omega \in \left\{0, \left(\frac{2\pi}{S}, 2\pi-\frac{2\pi}{S}\right), \left(\frac{4\pi}{S}, 2\pi-\frac{4\pi}{S}\right), \ldots, \left(\frac{\lfloor S/2 \rfloor\pi}{S}, 2\pi-\frac{\lfloor S/2\rfloor\pi}{S}\right) \right\}$$ if $S$ is odd.<br /><br /> Thus, given a stationary time series with $S$ period cycles, we expect the periodogram to protrude at the non-zero frequencies. In particular, we present the periodogram for deterministic and stochastic seasonal processes below:<br /><br /> <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 3A :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-nfQ9gfzbaV8/XL34GxM97eI/AAAAAAAAAt8/zwgbTwDu3MU8-tkF7OdUDvSBvA7j5bCUACEwYBhgL/s1600/dsprdgrm.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-nfQ9gfzbaV8/XL34GxM97eI/AAAAAAAAAt8/zwgbTwDu3MU8-tkF7OdUDvSBvA7j5bCUACEwYBhgL/s1600/dsprdgrm.jpg" title="Deterministic Seasonality Periodogram" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 3A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 3B :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-n6oWWlny_30/XL34Gg5AJHI/AAAAAAAAAt4/1hmGndwDR20hVcGhUrZbirTE_uEbAWpmwCEwYBhgL/s1600/ssprdgrm.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-n6oWWlny_30/XL34Gg5AJHI/AAAAAAAAAt4/1hmGndwDR20hVcGhUrZbirTE_uEbAWpmwCEwYBhgL/s1600/ssprdgrm.jpg" title="Stochastic Seasonality Periodogram" width="320" /></a><br /> </center> <!-- :::::::::: FIGURE 3B :::::::::: --> </td> </tr> <tr> <td> <center> <small>Figure 3A: Deterministic Seasonality Periodogram</small><br /><br /> </center> </td> <td> <center> <small>Figure 3B: Stochastic Seasonality Periodogram</small><br /><br /> </center> </td> </tr> </tbody> </table> We can see from the periodograms that the spectrum of deterministic seasonal processes exhibits sharp peaks at the seasonal frequencies $\omega$, whereas that of stochastic seasonal processes exhibits a window of sharp peaks centered around seasonal frequencies $\omega$. In case of stochastic seasonality, the fact that the spectrum spreads around principal frequencies and is not a single peak reaffirms the notion that cycles are stochastically distributed around said frequencies.<br /><br /> <h3>Seasonal Unit Roots</h3> A particularly important form of stochastic seasonality manifests in the form of unit roots at some or all of the frequencies $\omega$. In particular, consider the following process:<br /><br /> $$y_{t} = \eta y_{t-S} + e_{t}$$ and note that the characteristic equation associated with the process is defined as:<br /><br /> \begin{align} 1 - \eta z^{S} = 0 \quad \text{or} \quad z^{S} = 1/\eta \label{eq1} \end{align} Analogous to the case of classical unit root processes, when $|\eta|=1$ or $|z| = 1^{1/S} = 1$, $y_{t}$ is in fact non-stationary. In contrast to the classical unit root case however, $y_{t}$ can possess not one, but upto $S$ unique unit roots. To see this, note that any complex number $z = a + ib$ can be written in polar form as:<br /><br /> $$z = \sqrt{a^{2} + b^{2}}(\cos(\theta) + i\sin(\theta)) = r(\cos(\theta) + i\sin(\theta))$$ where $r = |z|$ is called the magnitude of $z$, but is also the radius of the circle in polar coordinates. Accordingly, when $|\eta | = 1$ or $|z|=1$, $z$ lies on a circle with radius $r = 1$. In other words, $y_{t}$ is a unit root process. Next, recall Euler's formula:<br /><br /> $$e^{ix} = \cos(x) + i \sin(x)$$ Clearly, any complex number $z$ with magnitude $r=1$ satisfies Euler's formula. In other words, $z = e^{i\theta}$. Since Euler's formula also implies that:<br /><br /> $$e^{2\pi i k} = 1 \quad \text{for} \quad k=0,1,2,\ldots$$ when $\eta=1$ or $|z|=1$, the characteristic equation \eqref{eq1} can be expressed as:<br /><br /> \begin{align*} z = e^{i\omega} &amp;= 1^{1/S} \notag\\ &amp;= (e^{2\pi i k})^{1/S}\notag\\ &amp;= e^{\frac{2\pi i k}{S}} \end{align*} where the relations above evidently hold for all $k=0,1,2,\ldots, S-1$ since the solutions begin to cycle when $k \geq S$. Now, taking logarithms of both sides, it is clear that:<br /><br /> \begin{align} \omega = \frac{2\pi k}{S} \quad \text{for} \quad k=0,1,2,\ldots, S-1 \label{eq2} \end{align} In other words, the characteristic equation \eqref{eq1} has $S$ unique solutions identified by the $S$ relationships in \eqref{eq2}. These solutions are equally (by $2\pi k/S$ degrees) spaced on the unit circle, with two real solutions associated with $\omega = 0$ and $\omega = \pi$, and the remaining $S-2$ imaginary solutions organized in harmonic pairs.<br /><br /> Thus, when we identify $S$ with a temporal frequency, namely a week, month, quarter, and so on, the problem of identifying roots of the characteristic equation \eqref{eq1} extends the classical unit root literature in which $S=1$ (or annual frequency), to that of identifying $S > 1$ possible roots on the unit circle.<br /><br /> In fact, like the classical unit-root literature in which unchecked unit roots are known to have severe inferential consequences, the presence of unit roots at seasonal frequencies can also give rise to similar inferential inaccuracies and concerns. Accordingly, identifying the presence of unit roots at one or more seasonal frequencies is the subject of the battery of tests known as <i>seasonal unit root tests</i>.<br /><br /> <h3>Seasonal Unit Root Tests</h3> Historically, the first test for a seasonal unit root was proposed by Dickey, Hasza and Fuller (1984) (DHF). In its simplest form, the test is based on running the regression:<br /><br /> $$(1-L^{S})y_{t} = \eta y_{t-s} + e_{t}$$ and testing the null hypothesis $H_{0}: \eta = 0$ against the one-sided alternative $H_{A}: \eta < 0$. The test is carried out using the familiar Student's-$t$ statistic on statistical significance for $\eta$, and analogous to the classic augmented Dickey-Fuller (ADF) test, exhibits a non-standard asymptotic distribution under the null. Nevertheless, the DHF test is very restrictive. Whereas the test imposes the existence of a unit root at all $S$ seasonal frequencies simultaneously, in reality, a process may exhibit a seasonal unit root at some seasonal frequencies but not others.<br /><br /> <h4>HEGY Seasonal Unit Root Test</h4> To correct for the shortcomings of the DHF test, Hylleberg, Engle, Granger and Yoo (1990) (HEGY) proposed a test for the determination of unit roots at each of the $S$ seasonal frequencies individually, or collectively. In particular, following the notation in Smith and Taylor (1999), in its simplest form, the HEGY test is based on regressions of the form:<br /><br /> \begin{align*} (1-L^{s})y_{St-s} &amp;= \mu + \pi_{0}L\left(1 + L + \ldots + L^{S-1}\right)y_{St-s}\\ &amp;+ L\sum_{k=1}^{S^{\star}}\left( \pi_{k,1}\sum_{j=0}^{S-1}\cos\left((j+1)\frac{2\pi k}{S}\right)L^{j} - \pi_{k,2}\sum_{j=0}^{S-1}\sin\left((j+1)\frac{2\pi k}{S}\right)L^{j} \right)y_{St-s}\\ &amp;+ \pi_{S/2}L\left(1 - L + L^{2} - \ldots - L^{S-1}\right)y_{St-s} + e_{t}\\ &amp;\equiv \mu + \pi_{0}y_{St-s-1, 0} + \sum_{k=1}^{S^{\star}}\pi_{k,1}y_{St-s-1,k,1} + \sum_{k=1}^{S^{\star}}\pi_{k,2}y_{St-s-1,k,2} + \pi_{S/2}y_{St-s-1, S/2} +e_{t} \end{align*} where $S^{\star} = (S/2) - 1$ if $S$ is even and $S^{\star} = \lfloor S/2 \rfloor$ if $S$ is odd, and as before, $s = S-1, \ldots, 1, 0$.<br /><br /> In particular, when data is quarterly with $S=4$ and therefore $S^{\star} = 1$, then:<br /><br /> \begin{align*} y_{4t-s, 0} &amp;= (1+L+L^{2}+L^{3})y_{4t-s}\\ y_{4t-s, 1,1} &amp;= -L(1-L^{2})y_{4t-s}\\ y_{4t-s, 1,2} &amp;= -(1-L^{2})y_{4t-s}\\ y_{4t-s, 2} &amp;= -(1-L+L^{2}-L^{3})y_{4t-s} \end{align*} Here, $y_{4t-s, 0}$ is in fact the series $y_{4t-s}$ filtered by the 0 frequency filter, $y_{4t-s, 1,1}$ is the series $y_{4t-s}$ filtered by the $\pi/2$ frequency filter, $y_{4t-s, 1,2}$ is the series $y_{4t-s}$ filtered by the $3\pi/2$ frequency filter, and $y_{4t-s, 2}$ is the series $y_{4t-s}$ filtered by the $\pi$ frequency filter.<br /><br /> To visualize the frequency filters, consider the spectral filter functions associated with each of the processes above. The latter are computed as $|\phi(e^{i\theta})|$ where $\phi(\cdot)$ is the lag polynomial applied to $y_{St-s}$, and $\theta \in [0, 2\pi)$. For instance, in case of quarterly data, the 0 frequency filter is computed as $|1 + e^{i\theta} + e^{i2\theta} + e^{i3\theta} + e^{i4\theta}|$, and so on.<br /><br /> <!-- :::::::::: FIGURE 4 :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-fdsyB7HGBTM/XL4p5kYpm5I/AAAAAAAAAuY/Wr2Pj5D42L4NV178cYYzKjRtq1afj7f7wCLcBGAs/s1600/filters.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-fdsyB7HGBTM/XL4p5kYpm5I/AAAAAAAAAuY/Wr2Pj5D42L4NV178cYYzKjRtq1afj7f7wCLcBGAs/s1600/filters.jpg" title="HEGY Seasonal Filters" width="320" /></a><br /> <small>Figure 4: HEGY Seasonal Filters</small><br /><br /> </center> Like the DHF test, the HEGY test also reduces to verifying parameter significance in the regression equation. Nevertheless, in contrast to DHF, HEGY tests can detect isolated effect of each seasonal frequency independently. In the case of quarterly data, for instance, a $t-$test on coefficient significance for $\pi_{1} = 0$ is in fact a test for a unit root in the $\omega = 0$ frequency, a $t-$test on coefficient significance for $\pi_{2} = 0$ is a test for the presence of a unit root at the $\omega = \pi$ frequency, and an $F-$test for the joint parameter significance of $\pi_{1,1} = 0$ and $\pi_{1,2} = 0$, is in fact a joint test for the presence of a unit root at the harmonic conjugate pair of frequencies $(\pi/2, 3\pi/2)$.<br /><br /> It should also be noted here that while we have focused on the simplest form, the HEGY test can accommodate various deterministic specifications in the form of seasonal dummies, constants, and trends. Moreover, in the presence of serial correlation in the innovation process, the HEGY test can also be augmented with lags of the dependent variable as additional regressors to the principal equation presented above, in order to mitigate the effect.<br /><br /> In fact, the HEGY test is very similar to the ADF test which is effectively a unit root test at the 0-frequency alone. Whereas the latter proceeds as a regression of a differenced series against its lagged level, the former proceeds as a regression of a seasonally differenced series against the lagged levels at each of the constituent seasonal frequencies. In this regard, the HEGY test is considered an extension of the ADF test in the direction of non-zero frequencies. As such, it also suffers from the same shortcomings as the ADF test, and can exhibit low statistical power when the individual frequencies are in fact stationary, but exhibit near-unit root behaviour.<br /><br /> <h4>Canova-Hansen Seasonal Unit Root Test</h4> One response to the low power of ADF tests in the presence of near unit root stationarity was the test of Kwiatkowski, Phillips, Schmidt, and Shin (1992) (KPSS), which is in fact a test for stationarity at the 0-frequency alone. The analogous development in the seasonal unit root literature was the test of Canova and Hansen (1995) (CH). Like the KPSS test, the CH test is also a test for stationarity but extends to non-zero seasonal frequencies.<br /><br /> The idea behind the CH test is to suppose that seasonality manifests in the process mean. In other words, given a process $y_{t}$, if seasonal effects are present, then $y_{t}$ will exhibit a seasonally dependent average. Traditionally, this is formalized using seasonal dummy variables as:<br /><br /> $$y_{t} = \sum_{s=0}^{S-1}\delta_{s}D_{s,t} + e_{t}$$ Nevertheless, it is well known that an equivalent representation using discrete Fourier expansions exists in terms of sine and cosine functions. In particular,<br /><br /> $$y_{t} = \sum_{k=0}^{S^{\star}}\left(\delta_{k,1}\cos\left(\frac{2\pi k t}{S}\right) + \delta_{k,2}\sin\left(\frac{2\pi k t}{S}\right)\right) + e_{t}$$ where $S^{\star}$ was defined earlier, and $\delta_{k,1}$ and $\delta_{k,2}$ are referred to as <i>spectral intercept</i> coefficients. In either case, the expression can be expressed in vector notation as follows:<br /><br /> \begin{align} y_{t} = \pmb{Z}_{t}^{\top}\pmb{\gamma}_{t} + e_{t} \label{eq3} \end{align} where $\pmb{Z}_{t} = \left(1, \pmb{z}_{1,t}^{\top}, \ldots, \pmb{z}_{S^{\star},t}^{\top} \right)$ (or $\pmb{Z}_{t} = \left(1, D_{1,t}, \ldots, D_{S-1,t}\right)$) and $\pmb{\gamma}_{t} = \left(\gamma_{1,t}, \ldots, \gamma_{S,t}\right)$ is a an $S\times 1$ vector of coefficients, and $\pmb{z}_{k,t} = \left(\cos\left(\frac{2\pi k t}{S}\right), \sin\left(\frac{2\pi k t}{S}\right)\right)$ for $j=1,\ldots, S^{\star}$, with the convention $\pmb{z}_{S^{\star},t} \equiv \cos(\pi t) = (-1)^{t}$ when $S$ is even.<br /><br /> Next, to distinguish between stationary and non-stationary seasonality, CH assume that the coefficient vector $\pmb{\gamma}_{t}$ evolves as the following AR(1) model:<br /><br /> \begin{align*} \pmb{\gamma}_{t} &amp;= \pmb{\gamma}_{t-1} + u_{t}\\ u_{t} &amp;\sim IID(\pmb{0}, \pmb{G})\\ \pmb{G} &amp;= \text{diag}(\theta_{1}, \ldots, \theta_{S}) \end{align*} Observe that when $\theta_{k} > 0$, then $\gamma_{k,t}$ follows a random walk. On the other hand, when $\theta_{k} = 0$, then $\gamma_{k,t} = \gamma_{k, t-1} = \gamma_{k}$, a fixed constant for all $t$. In other words, when $\theta_{k} > 0$, the process $y_{t}$ exhibits a seasonal unit root at the harmonic frequency pair $(\frac{2\pi k}{S}, 2\pi - \frac{2\pi k}{S})$ for $1\leq k &lt; \lfloor S/2 \rfloor$, and the frequency $\frac{2\pi k}{S}$ if $k=0$ or $k = \lfloor S/2 \rfloor$. In this regard, to test the null hypothesis that $y_{t}$ exhibits at most deterministic seasonality at certain (possibly all) frequencies, against the alternative hypothesis that $y_{t}$ exhibits a seasonal unit root at certain (possibly all) frequencies, define $\pmb{A}_1$ and $\pmb{A}_2$ as mutually orthogonal, full column-rank, $(S \times a_1)-$ and $(S \times a_2)$-matrices which respectively constitute $1 \leq a_1 \leq S$ and $a_2 = S - a_1$ sub-columns from the order-$S$ identity matrix $\pmb{I}_s$.<br /><br /> For instance, if one wishes to test whether a seasonal unit root exists at frequency $\pi$, one would set $\pmb{A}_{1} = (0,\ldots, 0,1)^{\top}$. Alternatively, if testing for a seasonal unit root at the frequency pair $\left(\frac{2\pi}{S}, 2\pi - \frac{2\pi}{S}\right)$, then one would set:<br /><br /> $$\pmb{A}_{1} = \begin{bmatrix} 0 &amp; 0 \\ 1 &amp; 0 \\ 0 &amp; 1 \\ 0 &amp; 0 \\ \vdots &amp; \vdots \\ 0 &amp; 0 \end{bmatrix}$$ Note further that one can further rewrite \eqref{eq3} as follows:<br /><br /> $$y_{t} = \pmb{Z}_{t}^{\top}\pmb{A}_{1}\pmb{A}_{1}^{\top}\pmb{\gamma}_{t} + \pmb{Z}_{t}^{\top}\pmb{A}_{2}\pmb{A}_{2}^{\top}\pmb{\gamma}_{t} + e_{t}$$ Next, define $\pmb{\Theta} = \left(\theta_{1}, \ldots, \theta_{S}\right)^{\top}$ and observe that the CH hypothesis battery reduces to:<br /><br /> \begin{align*} H_{0}: \text{}\pmb{A}_{1}^{\top}\pmb{\Theta} = \pmb{0}\\ H_{A}: \text{}\pmb{A}_{1}^{\top}\pmb{\Theta} &gt; 0 \end{align*} where in addition to $H_{0}$, it is implicitly maintained that $H_{M}:\text{} \pmb{A}_{2}^{\top}\pmb{\Theta} = \pmb{0}$. In particular, notice that when both $H_{0}$ and $H_{M}$ hold, equation \eqref{eq3} reduces to:<br /><br /> \begin{align} y_{t} = \pmb{Z}_{t}^{\top}\pmb{\gamma} + e_{t} \label{eq4} \end{align} where $\pmb{\gamma}$ is now constant across time. In other words, $y_{t}$ exhibits at most deterministic (stationary) seasonality. In this regard, holding $H_{M}$ implicitly true, Canova and Hansen (1995) propose a consistent test for $H_{0}$ versus $H_{A}$, using the statistic:<br /><br /> \begin{align*} \mathcal{L} = T^{-2} \text{tr}\left(\left(\pmb{A}_{1}^{\top}\widehat{\pmb{\Omega}}\pmb{A}_{1}\right)^{-1}\pmb{A}_{1}^{\top}\left(\sum_{t=1}^{T}\widehat{F}_{t}\widehat{F}_{t}^{\top}\right)\pmb{A}_{1}^{\top}\right) \end{align*} where $\text{tr}(\cdot)$ is the trace operator, $\widehat{e}_{t}$ are the OLS residuals from regression \eqref{eq4}, $\widehat{F}_{t} = \sum_{t=1}^{T} \widehat{e}_{t}\pmb{Z}_{1,t}$, and the HAC estimator<br /><br /> $$\widehat{\pmb{\Omega}} = \sum_{j=-T+1}^{T-1}\kappa\left(\frac{j}{h}\right)\widehat{\pmb{\Gamma}}(j)$$ Above, $\kappa(\cdot)$ is the kernel function, $h$ is the bandwidth parameter, and $\widehat{\pmb{\Gamma}}(j)$ is the autocovariance (at level $j$ ) estimator<br /><br /> $$\widehat{\pmb{\Gamma}}(j) = T^{-1} \sum_{t=j+1}^{T} \widehat{e}_{t}\pmb{Z}_{t}\widehat{e}_{t-j}\pmb{Z}_{t-j}^{\top}$$ Naturally, we reject the null hypothesis when $\mathcal{L}$ is larger than some critical value which depends on the rank of $\pmb{A}_{1}$.<br /><br /> <h4>Unattended Unit Roots</h4> A well-known problem with the CH test concerns the issue of <i>unattended unit roots</i>. In particular, CH tests the null hypothesis $H_{0}$ while imposing $H_{M}$, where the latter lies in the complementary space to that generated by the former. In practice however, one does not know which spectral frequency exhibits a unit root. If one did know, the exercise of testing for their presence would be nonsensical. In this regard, if $H_{0}$ is imposed but $H_{M}$ is violated, then, Taylor (2003) shows that the CH test is severely undersized. To overcome the shortcoming, Taylor (2003) suggests filtering the regression equation \eqref{eq3} to reduce the order of integration at all spectral frequencies identified in $\pmb{A}_{2}$. In particular, consider the filter:<br /><br /> $$\nabla_{2} = \frac{1 - L^{S}}{\nabla_{1}}$$ where $\nabla_{1}$ reduces, by one, the order of integration at each frequency identified in $\pmb{A}_{1}$. For instance, if $\pmb{A}_{1}$ identifies the 0-frequency, then $\nabla_{1} = (1 - L)$ and $\nabla_{2} = \frac{1-L^{S}}{1-L} = 1 + L + \ldots + L^{S-1}$. Alternatively, if $\pmb{A}_{1}$ identifies the harmonic frequency pair $\left(\frac{2\pi k}{S}, 2\pi - \frac{2\pi k}{S}\right)$, then $\nabla_{1} = 1 - 2\cos\left(\frac{2\pi k}{S}\right)L + L^{2}$, and so on. Accordingly, if we assume $\pmb{\gamma}_{t} = \pmb{\gamma}_{t-1} + u_{t}$, it is clear that $\nabla_{2}y_{t}$ will not admit unit root behaviour at any of the frequencies identified in $\pmb{A}_{2}$ and the maintained hypothesis $H_{M}$ will hold. See Taylor (2003) and Busetti and Taylor (2003) for further details.<br /><br /> Furthermore, since $\nabla_{2}$ acts only on frequencies identified in $\pmb{A}_{2}$, it can also be formally shown that the regressors $\nabla_{2}\pmb{Z}_{t}^{\top}\pmb{A}_{1}$ span a space identical to the space spanned by $\pmb{Z}_{t}^{\top}\pmb{A}_{1}$. Accordingly, the strategy in Taylor (2003) is to run the regression:<br /><br /> \begin{align*} \nabla_{2}y_{t} &amp;= \nabla_{2}\pmb{Z}_{t}^{\top}\pmb{A}_{1}\pmb{A}_{1}^{\top}\pmb{\gamma}_{t} + \nabla_{2}\pmb{Z}_{t}^{\top}\pmb{A}_{2}\pmb{A}_{2}^{\top}\pmb{\gamma}_{t} + \nabla_{2}e_{t} \\ &amp;= \pmb{Z}_{t}^{\top}\pmb{A}_{1}\pmb{A}_{1}^{\top}\pmb{\gamma}_{t} + e_{t}^{\star} \end{align*} where $e_{t}^{\star} = \nabla_{2}\pmb{Z}_{t}^{\top}\pmb{A}_{2}\pmb{A}_{2}^{\top}\pmb{\gamma}_{t} + \nabla_{2}e_{t}$. Naturally, the modified test statistic is now given by:<br /><br /> \begin{align*} \mathcal{L}^{\star} = T^{-2} \text{tr}\left(\left(\pmb{A}_{1}^{\top}\widehat{\pmb{\Omega}}^{\star}\pmb{A}_{1}\right)^{-1}\pmb{A}_{1}^{\top}\left(\sum_{t=1}^{T}\widehat{F}_{t}^{\star}\widehat{F}_{t}^{\star^{\top}}\right)\pmb{A}_{1}^{\top}\right) \end{align*} where $\widehat{F}_{t}^{\star} = \sum_{t=1}^{T} \widehat{e}_{t}^{\star}\pmb{Z}_{1,t}$ and $\widehat{\pmb{\Omega}}^{\star}$ is computed analogous to $\widehat{\pmb{\Omega}}$ upon replacing $\widehat{e}_{t}$ with $\widehat{e}_{t}^{\star}$.<br /><br /> <h3>Seasonal Unit Root Test in EViews</h3> Starting with version 11 of EViews, a battery of tests aimed at diagnosing unit roots in the presence seasonality are now supported natively. These tests include the most famous Hylleberg, Engle, Granger, and Yoo (1990) (HEGY) test as well its Smith and Taylor (1999) likelihood ratio variant, the Canova and Hansen (1995) (CH) test, and the Taylor (2005) variance ratio test.<br /><br /> Here, we will apply the HEGY and CH tests to detect the presence of seasonal unit roots in quarterly U.S. government consumption expenditures and gross investment data running from 1947 to 2018. We have named the series object containing the data as <b>USCONS</b>. The latter can either be opened from the workfile associated with this blog, or by running a fetch procedure to grab the data directly from the FRED database. In case of the latter, in EViews, issue the following commands in the command window:<br /><br /> <pre><br /> wfcreate q 1947q1 2018q4<br /> fetch(d=fred) NA000333Q<br /> rename NA000333Q uscons<br /> </pre> We begin with a plot of the data. To do so, double click on a <b>USCONS</b> in the workfile to open the series object. Next, click on <b>View/Graph...</b>. This will open a graph options window. We will stick with the defaults so click on <b>OK</b>. The output is reproduced below.<br /><br /> <!-- :::::::::: FIGURE 5 :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-GlpQtAZiLJY/XL4xMFbOTqI/AAAAAAAAAuw/J4eehkrtXBciIbp4EZlmBNtiba2-YA58QCLcBGAs/s1600/usconsgrph.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-GlpQtAZiLJY/XL4xMFbOTqI/AAAAAAAAAuw/J4eehkrtXBciIbp4EZlmBNtiba2-YA58QCLcBGAs/s1600/usconsgrph.jpg" title="Time Series Plot of USCONS" width="320" /></a><br /> <small>Figure 5: Time Series Plot of USCONS</small><br /><br /> </center> A visual analysis indicates data is trending with very prominent seasonal effects. To determine statistically whether these seasonal effects exhibit unit roots, we click on <b>View/Unit Root Tests/Seasonal Unit Root Tests...</b> to open the seasonal unit root test window.<br /><br /> <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-IOV0OdjWwG8/XL4xeK3zTvI/AAAAAAAAAu4/NyzUQuwil9YtlTojfaaVzTSRq5HoDzjlQCEwYBhgL/s1600/hegydlg.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-IOV0OdjWwG8/XL4xeK3zTvI/AAAAAAAAAu4/NyzUQuwil9YtlTojfaaVzTSRq5HoDzjlQCEwYBhgL/s1600/hegydlg.jpg" title="HEGY Test Dialog" width="320" /></a><br /> <small>Figure 6: HEGY Test Dialog</small><br /><br /> </center> We will start with the HEGY test, which is the default test. Here, EViews has already filled out the periodicity with 4 to match the cyclicality of the data. Nevertheless, if you wish to test the data under a different periodicity, you may manually adjust this to one of the following supported values: 2, 4, 5, 6, 7, 12. Since our data is trending, we will change the <b>Non-Seasonal deterministics</b> dropdown from <b>None</b> to <b>Intercept and trend</b> and leave the <b>Seasonal Deterministics</b> dropdown unchanged.<br /><br /> As discussed earlier, in case of serially correlated errors, the HEGY test can be augmented by lags of the dependent variable added as additional regressors to the HEGY regression. To determine the precise number of lags to add, EViews offers both automatic and manual methods. The default is automatic lag selection with Akaike Information Criterion and maximum of 12 lags. The details can be changed of course, or, if automatic selection is undesired, a <b>User Selected</b> value can be specified. We will stick with the defaults. Hit <b>OK</b>.<br /><br /> <!-- :::::::::: FIGURE 7 :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-tVC52Adt14g/XL4xkQhZMmI/AAAAAAAAAvQ/cJjgkQnFn4shqvETTxtULjPMQ4emhCGbQCEwYBhgL/s1600/hegytbl.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-tVC52Adt14g/XL4xkQhZMmI/AAAAAAAAAvQ/cJjgkQnFn4shqvETTxtULjPMQ4emhCGbQCEwYBhgL/s1600/hegytbl.jpg" title="HEGY Test Output" width="320" /></a><br /> <small>Figure 7: HEGY Test Output</small><br /><br /> </center> Looking at the output, EViews provides a table, the top portion of which summarizes the testing procedure, whereas the lower summarizes the regression output upon which the test is conducted. In particular, EViews computes the HEGY test statistic for each of the 0, harmonic pairs, and $\pi$ frequencies, in addition to the joint test for all seasonal frequencies -- a joint test for all frequencies other than 0 -- and a joint test for all frequencies including the frequency 0. As in traditional unit root tests, the null hypothesis postulates the existence of a unit root at the seasonal frequencies under consideration and rejection of the null requires the absolute value of the test statistic to exceed the absolute value of a critical value associated with the limiting distribution. In this regard, EViews summarizes the 1\%, 5\%, and 10\% critical values derived from simulation for sample sizes ranging from 20 to 480 in intervals of 20. To adjust for the actual sample size used in the HEGY regression, EViews also offers an interpolated version of the critical values. Here, it is clear that we will not reject the null hypothesis at any of the individual or harmonic pair frequencies, nor at the two joint tests. The overwhelming conclusion is that <b>USCONS</b> exhibits a unit root at each of the quarterly spectral frequencies individually and jointly.<br /><br /> Consider next the CH test applied to the same data. To bring up the CH test options, from the series object, once again click on <b>View/Unit Root Tests/Seasonal Unit Root Tests...</b> and under the <b>Test type</b> dropdown, select <b>Canova and Hansen</b>. As before, we will leave the <b>Periodicity</b> unchanged and will change the <b>Non-Seasonal Deterministics</b> to <b>Intercept and trend</b>. Note here that the traditional Canova and Hansen (1995) paper does not allow for the inclusion of deterministic trends. However, as noted in Busetti and Harvey (2003), we can relax the conditions of CH by showing that the distribution is unaffected when a deterministic trend is included in the model''.<br /><br /> <!-- :::::::::: FIGURE 8 :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-9HBvymMw6yc/XL4xkG00nHI/AAAAAAAAAvM/tghX_6dEDrk0l3j98eBozXn0XZlQ5Y5MACEwYBhgL/s1600/chdlg.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-9HBvymMw6yc/XL4xkG00nHI/AAAAAAAAAvM/tghX_6dEDrk0l3j98eBozXn0XZlQ5Y5MACEwYBhgL/s1600/chdlg.jpg" title="CH Test Dialog" width="320" /></a><br /> <small>Figure 8: CH Test Dialog</small><br /><br /> </center> Next, change the <b>Seasonal Deterministics</b> dropdown from <b>Seasonal dummies</b> to <b>Seasonal intercepts</b>. Notice that when we do this the <b>Restriction selection</b> box changes to reflect that restrictions are no longer on seasonal dummies, but on seasonal intercepts. Note that we can multi-select which frequencies we would like to test. This is equivalent to specifying the entries of the matrix $\pmb{A}_{1}$ we considered earlier. If no restrictions are selected, which is the default, then EViews will test all available restrictions. Here we will not select anything.<br /><br /> We will also leave the <b>Include lag of dep. variable</b> untouched. As noted in Canova and Hansen (1995), the inclusion of a lagged dependent variable in the CH regression will reduce this serial correlation (we can think of this as a form of pre-whitening), yet not pose a danger of extracting a seasonal root''. At last, note the <b>HAC Options</b> button which opens a set of options associated with how the long-run variance is computed and gives users the option to customize which kernel and bandwidths are used, and whether further residual whitening is desired. We stick with default values and simply click on <b>OK</b> to execute the test.<br /><br /> <!-- :::::::::: FIGURE 9 :::::::::: --> <center> <a href="https://lh3.googleusercontent.com/-rqSM5wpijrs/XL4xkB7TNYI/AAAAAAAAAvI/T5unGNvBNbIn-TuiEZ41RPWOZ0fS2ipsgCEwYBhgL/s1600/chtbl.jpg"><img height="auto" src="https://lh3.googleusercontent.com/-rqSM5wpijrs/XL4xkB7TNYI/AAAAAAAAAvI/T5unGNvBNbIn-TuiEZ41RPWOZ0fS2ipsgCEwYBhgL/s1600/chtbl.jpg" title="CH Test Output" width="320" /></a><br /> <small>Figure 9: CH Test Output</small><br /><br /> </center> Turning to the output, EViews divides the analysis into four sections. The first is a table summarizing the joint test for all elements in $\pmb{A}_{1}$. In the example at hand, we have 3 restrictions -- 2 associated with the harmonic pair $(\frac{\pi}{2}, \frac{3\pi}{2})$ , and one associated with the frequency $\pi$. Since the null hypothesis is that no unit root exists at the specified frequencies and the test statistic 4.53631 is larger than either of the 1\%, 5\%, or 10\% critical values, we conclude that the joint test rejects the null hypothesis.<br /><br /> The next table presents a detailed look at the harmonic pair test. Although we did not explicitly ask for this test, EViews presents a breakdown of the joint test requested into its constituent restrictions. These are harmonic pair tests in which the restriction matrix $\pmb{A}_{1}$ would be $S\times 2$. In this case, the test for no seasonal unit root at the harmonic pair is 2.968384 which is clearly larger than any of the critical values associated with the limiting distribution. In other words, we reject the null and conclude that there's evidence of a unit root at the harmonic pair frequencies. Notice also that in addition to the CH test statistic EViews also offers an additional test statistic marked by an asterisk for differentiation. This is in fact the test statistic that corresponds to the Taylor (2003) version of the CH test robustified to the possible violation of the maintained hypothesis $H_{M}$ discussed earlier.<br /><br /> The table beneath the harmonic pair tests is a table summarizing CH tests corresponding to the individual breakdown of all frequencies under consideration. In other words, these are individual tests in which the restriction matrix $\pmb{A}_{1}$ would be $S\times 1$. Since the frequency $\pi$ was requested as part of the joint test, it is reported here. Clearly, with the test statistic equaling 3.842780, we reject the null hypothesis and conclude in favor of evidence supporting the existence of a unit root at the frequency $\pi$. As before, note here that below the test statistic associated with the $\pi$ frequency is an additional statistic differentiated by an asterisk. This, as before, is the Taylor (2003) version of the CH test robustified to unattended unit roots.<br /><br /> At last, the final table presents the CH regression. The residuals from this regression are used in the computation of the CH test statistics.<br /><br /> <h3>Conclusion</h3> In this entry we gave a brief introduction into the subject of seasonal unit root tests. We highlighted the need to distinguish between deterministic and stochastic cyclicality and discussed several statistical methods designed to do so. Among these, our focus was on the HEGY tests, which is effectively an extension of the ADF test in the direction of non-zero seasonal frequencies, and the CH test, which is the analogue of the KPSS test in the direction of non-zero seasonal frequencies. We also looked at some of the mathematical details which underly these methods. At last, we closed with a brief application of both tests to the US Government consumption expenditure and investment data, sampled quarterly from 1947 to 2018. Both tests overwhelmingly supported evidence of unit roots at both individual and joint frequencies.<br /><br /> <h3>Files</h3> The workfile and program files can be downloaded here.<br /><br /> <ul> <li> <a href="http://www.eviews.com/blog/seasuroot/seasuroot.WF1">seasuroot.WF1</a> <li> <a href="http://www.eviews.com/blog/seasuroot/seasuroot.prg">seasuroot.prg</a> </ul> <br /><br /> <hr /> <h3> References</h3><table> <tr valign="top"><td align="right" class="bibtexnumber"><a name="busetti-2003">1</a></td><td class="bibtexitem">Fabio Busetti and AM&nbsp;Robert Taylor. Testing against stochastic trend and seasonality in the presence of unattended breaks and unit roots. <em>Journal of Econometrics</em>, 117(1):21--53, 2003. [&nbsp;<a href="references_bib.html#busetti-2003">bib</a>&nbsp;] </td></tr> <tr valign="top"><td align="right" class="bibtexnumber"><a name="busetti-2003a">2</a></td><td class="bibtexitem">Fabio Busetti and Andrew Harvey. Seasonality tests. <em>Journal of Business &amp; Economic Statistics</em>, 21(3):420--436, 2003. [&nbsp;<a href="references_bib.html#busetti-2003a">bib</a>&nbsp;] </td></tr> <tr valign="top"><td align="right" class="bibtexnumber"><a name="canova-1995">3</a></td><td class="bibtexitem">Fabio Canova and Bruce&nbsp;E Hansen. Are seasonal patterns constant over time? a test for seasonal stability. <em>Journal of Business &amp; Economic Statistics</em>, 13(3):237--252, 1995. [&nbsp;<a href="references_bib.html#canova-1995">bib</a>&nbsp;] </td></tr> <tr valign="top"><td align="right" class="bibtexnumber"><a name="hylleberg-1990">4</a></td><td class="bibtexitem">Svend Hylleberg, Robert&nbsp;F Engle, Clive&nbsp;WJ Granger, and Byung&nbsp;Sam Yoo. Seasonal integration and cointegration. <em>Journal of econometrics</em>, 44(1-2):215--238, 1990. [&nbsp;<a href="references_bib.html#hylleberg-1990">bib</a>&nbsp;] </td></tr> <tr valign="top"><td align="right" class="bibtexnumber"><a name="kwiatkowski-1992">5</a></td><td class="bibtexitem">Denis Kwiatkowski, Peter&nbsp;CB Phillips, Peter Schmidt, and Yongcheol Shin. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? <em>Journal of econometrics</em>, 54(1-3):159--178, 1992. [&nbsp;<a href="references_bib.html#kwiatkowski-1992">bib</a>&nbsp;] </td></tr> <tr valign="top"><td align="right" class="bibtexnumber"><a name="smith-1999">6</a></td><td class="bibtexitem">Richard&nbsp;J Smith and AM&nbsp;Robert Taylor. Likelihood ratio tests for seasonal unit roots. <em>Journal of Time Series Analysis</em>, 20(4):453--476, 1999. [&nbsp;<a href="references_bib.html#smith-1999">bib</a>&nbsp;] </td></tr> <tr valign="top"><td align="right" class="bibtexnumber"><a name="taylor-2003">7</a></td><td class="bibtexitem">Robert&nbsp;AM Taylor. Robust stationarity tests in seasonal time series processes. <em>Journal of Business &amp; Economic Statistics</em>, 21(1):156--163, 2003. [&nbsp;<a href="references_bib.html#taylor-2003">bib</a>&nbsp;] </td></tr> <tr valign="top"><td align="right" class="bibtexnumber"><a name="taylor-2005">8</a></td><td class="bibtexitem">AM&nbsp;Robert Taylor. Variance ratio tests of the seasonal unit root hypothesis. <em>Journal of Econometrics</em>, 124(1):33--54, 2005. [&nbsp;<a href="references_bib.html#taylor-2005">bib</a>&nbsp;] </td></tr></table></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com2tag:blogger.com,1999:blog-6883247404678549489.post-41589892192379248992019-02-01T11:21:00.000-08:002019-02-01T11:21:04.635-08:00Time varying parameter estimation with Flexible Least Squares and the tvpuni add-in<span style="font-family: &quot;verdana&quot; , sans-serif;"><i>Author and guest post by <a href="https://www.linkedin.com/in/eren-ocakverdi-9b673924" target="_blank">Eren Ocakverdi</a></i></span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Professional life of a researcher who follows or responsible from an emerging market can become so miserable when things suddenly change and the past experience does not hold anymore. As a practitioner you can get used to it over time, but it’s a whole different story when it comes to identifying empirical relationships between market indicators as part of your job.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">History can be a really good gauge to understand how such indicators are linked to one another only if you look through a proper glass. Abrupt changes, structural breaks or transition periods may alter such relationships so much that they would be misidentified with those traditional methods where the underlying structure is assumed fixed over the full sample.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"></span><br /><a name='more'></a><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">EViews already has nice built-in features or add-ins to deal with such cases. Here, I will add another one to this bundle: Meet the tvpuni add-in, which implements “Flexible Least Squares” approach of Kabala and Tesfatsion (1989).&nbsp;</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;">One way to look at the parameter stability is to allow coefficients to change over time. A well-known approach in this case is treating these parameters as random walk coefficients and estimate them within a state space framework via Kalman filter. However, estimation of such models can be troublesome in practice due to various reasons and may become a very frustrating experience if you have to deal with convergence problems .</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Flexible least squares emerges as a useful alternative, since it makes fewer assumptions than Kalman filter and allows us to determine the degree of smoothness. Help file explains the use of this add-in, so I’ll proceed with demonstrating its abilities through an actual case study.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Turkey’s disinflation process since the aftermath of 2001 crisis was interrupted from time to time due to shocks and stresses originating from different sources. Raw materials constitute more than 70% of the total imports (or 20% of GDP) in Turkish economy making her especially vulnerable to developments in exchange rates and prices of imported goods (i.e. crude oil). Although Turkey has been an (explicit) inflation targeter since 2006, frequently overshooting the target has made it very difficult for central bank to anchor expectations and weakened its hand in fight against inflation persistence.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Following example considers an augmented version of Phillips curve for exploring the determinants of inflation dynamics.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="color: #6aa84f; font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">'create a workfile</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">wfcreate m 2003 2018</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><br /></span><span style="color: #6aa84f; font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">'get the data (retrieve from Bloomberg or open :\tvpuni_data.wf1)</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">dbopen(type=bloom) index&nbsp; <span style="color: #6aa84f;">'open database</span></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">copy index::<span style="color: #cc0000;">"tucxue index"</span> corecpi&nbsp; <span style="color: #6aa84f;">'Core Consumer Price Index (2003=100)</span></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">copy index::<span style="color: #cc0000;">"tues01eu index"</span> infexp12&nbsp; <span style="color: #6aa84f;">'Inflation expectations over the next 12 months</span></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">copy index::<span style="color: #cc0000;">"trtfimvi index"</span> imprice&nbsp; <span style="color: #6aa84f;">'Foreign trade import unit value index (2010=100)</span></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">copy index::<span style="color: #cc0000;">"tuiosa"</span> ipi&nbsp; <span style="color: #6aa84f;">'Industrial Production Index (SA, 2015=100)</span></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">copy index::<span style="color: #cc0000;">"usdtry curncy"</span> usdtry&nbsp; <span style="color: #6aa84f;">'Exchange rate</span></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><br /></span><span style="color: #6aa84f; font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">‘dependent variable</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">series coreinf = @pcy(corecpi) <span style="color: #6aa84f;">'core inflation (excl. unprocessed food, alcoholic beverages and tobacco)</span></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><br /></span><span style="color: #6aa84f; font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">‘generate some regressors</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">series impinf = @pcy(imprice*usdtry) <span style="color: #6aa84f;">'inflationary pressure from import prices (converted to local currency)</span></span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">hpf(power=4) log(ipi)*100 trend @ gap <span style="color: #6aa84f;">'output gap proxy</span></span><br /><br /><span style="color: #6aa84f; font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">'simple fixed parameter estimation</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">equation fixed.ls coreinf infexp12 coreinf(-1) gap impinf</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Results suggest that backward indexation matters more than forward looking in price setting. Output gap and import prices both have expected signs. All the coefficients are significant at conventional alpha levels. Explanatory power of the model is more than satisfactory, but we are interested in the stability of this relationship.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="color: #6aa84f; font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">'time varying parameter estimation with flexible least squares</span><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">fixed.tvpuni(method=<span style="color: #cc0000;">"1"</span>,lambda=<span style="color: #cc0000;">"100"</span>,savem)</span><br /><span style="color: #6aa84f; font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">'plot results</span><br /><br /><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">grbetam.line(m)</span><br /><div><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-Ci1gNckiTkQ/XFSZ1NgbGEI/AAAAAAAAAr8/djyX6jeFbBceXcOhTdBqzXZ2oeOhAaFJACEwYBhgL/s1600/graph01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1003" data-original-width="1600" height="400" src="https://4.bp.blogspot.com/-Ci1gNckiTkQ/XFSZ1NgbGEI/AAAAAAAAAr8/djyX6jeFbBceXcOhTdBqzXZ2oeOhAaFJACEwYBhgL/s640/graph01.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"></div><br /><div class="separator" style="clear: both; text-align: center;"></div><br /></div><div class="separator" style="clear: both; text-align: center;"></div><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div><div><span style="font-family: &quot;verdana&quot; , sans-serif;">Results suggest that the coefficient of forward looking has risen, whereas the coefficient of backward indexation has fallen over time and they have become more-or-less equal. Fluctuation around zero makes the coefficient of output gap unreliable and difficult to interpret. Passthrough from import prices, on the other hand, seems to be on the rise since 2016.</span></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;">Behavioral change in coefficients around 2008 should be an easy one as it can be attributed to global financial crisis. However, it may not be that straightforward to explain the dynamics after the end-2010. This era until the first half of 2018 denotes when Central Bank of Turkey implemented an unconventional monetary policy (i.e. an asymmetric and wide interest rate corridor).&nbsp;</span></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;">An approximating model of flexible least squares approach within a state space framework is possible and may be preferable depending on the case at hand. Although the results would not be the same due to different assumptions behind these frameworks, you can get smoothed estimates of coefficients along with their associated confidence bands.</span></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div><span style="color: #6aa84f; font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">'flexible least squares estimation with Kalman filter</span></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">fixed.tvpuni(method=<span style="color: #cc0000;">"3"</span>,lambda=<span style="color: #cc0000;">"100"</span>,savem,saves)</span></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;">We can plot the results manipulating the output saved in to the workfile with a little bit effort:</span><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-o320YAiSbXY/XFSZ3cMMRfI/AAAAAAAAAsE/BOdeW6FQFCYcoGMJ4XxgV28PiFcGQm_-gCEwYBhgL/s1600/graph02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1153" data-original-width="1600" height="460" src="https://3.bp.blogspot.com/-o320YAiSbXY/XFSZ3cMMRfI/AAAAAAAAAsE/BOdeW6FQFCYcoGMJ4XxgV28PiFcGQm_-gCEwYBhgL/s640/graph02.png" width="640" /></a></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div></div><div><span style="font-family: Verdana, sans-serif;">Note that the confidence band around the coefficient of output gap reveals the insignificance of this parameter as suspected.</span><br /><span style="font-family: Verdana, sans-serif;"><br /></span><span style="font-family: Verdana, sans-serif;">Add-in also allows you to migrate your original model to state space and to estimate each parameter as a random walk via Kalman filter.</span><br /><span style="font-family: Verdana, sans-serif;"><br /></span><span style="color: #6aa84f; font-family: Courier New, Courier, monospace;">'state space estimation with Kalman filter</span><br /><span style="font-family: Courier New, Courier, monospace;">fixed.tvpuni(method=<span style="color: #cc0000;">"4"</span>,savem,saves)</span><br /><span style="font-family: Verdana, sans-serif;"><br /></span><span style="font-family: Verdana, sans-serif;">Again, we can compare estimated parameters if we organize our output:&nbsp;</span></div><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-AfnlMzlY3SU/XFSZ3aX40EI/AAAAAAAAAsM/bfn5U2tINtcKb0G7PGyAveEpxkuk8Q3PgCEwYBhgL/s1600/graph03.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1110" data-original-width="1600" height="442" src="https://3.bp.blogspot.com/-AfnlMzlY3SU/XFSZ3aX40EI/AAAAAAAAAsM/bfn5U2tINtcKb0G7PGyAveEpxkuk8Q3PgCEwYBhgL/s640/graph03.png" width="640" /></a></div><div><span style="font-family: Verdana, sans-serif;">Results from all three approaches portray similar patterns and therefore yield similar inferences.&nbsp;</span></div><div><span style="font-family: Verdana, sans-serif;"><br /></span></div><div><span style="font-family: Verdana, sans-serif;"><b><i>References</i></b></span></div><div><span style="font-family: Verdana, sans-serif;">Kalaba, R. and Tesfatsion, L., 1989. "Time Varying Linear Regression via Flexible Least Squares", Computers and Mathematics with Applications, Vol. 17, pp. 1215-1245</span></div>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com3tag:blogger.com,1999:blog-6883247404678549489.post-57763061590942396862018-12-11T10:20:00.000-08:002018-12-11T10:20:45.694-08:00Panel Structural VARs and the PSVAR add-in<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <i><span style="font-family: &quot;verdana&quot; , sans-serif;">Author and guest blog by Davaajargal Luvsannyam</span></i><br /><i><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></i><span style="font-family: &quot;verdana&quot; , sans-serif;">Panel SVARs have been used to address a variety of issues of interest to policymakers and applied economists. Panel SVARs are particularly suitable to analyze the transmission of idiosyncratic shocks across units and time. For example, <a href="https://www.sciencedirect.com/science/article/pii/S0165188912000942" target="_blank">Canova et al. (2012)</a> have studied how U.S. interest rate shocks are propagated to 10 European economies, 7 in the Euro area and 3 outside of it, and how German shocks are transmitted to the remaining nine economies.&nbsp;</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"></span><br /><a name='more'></a><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Panel SVARs have also been often used to estimate average effects – possibly across heterogeneous groups of units - and to describe unit specific differences relative to the average. For example, researcher may analyze if monetary policy is more countercyclical, on average, in countries or states.&nbsp; Researcher may also be interested in knowing whether inflation dynamics in states may depend on political, geographical, cultural or institutional features, or on whether monetary and fiscal interactions are related.&nbsp;</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Alternative potential use of panel SVARs is in studying the importance of interdependencies, and in checking whether reactions are generalized or only involve certain pairs of units. Therefore, some researchers want to implement a panel SVARs to evaluate certain exogeneity assumptions or to test the small open economy assumption, often made in the international economics literature.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;"></span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;">In this blog, we describe the econometric estimation and implementation of the Panel SVAR of <a href="https://pdfs.semanticscholar.org/d97e/9fb8b0243975feb6df7364765fed9eb7b5e9.pdf" target="_blank">Pedroni (2013)</a>. The key to Pedroni (2013) estimation and identification method will be the assumption that structural shocks can be decomposed into both common and idiosyncratic structural shocks, which are mutually orthogonal.</span><br /><div><br /></div><div><br /></div><div><h3>Structural shock representation</h3>Associated with the $M\times1$ vector of demeaned panel data, $z_{it}$, let $\xi_{it} = \left(\bar{\epsilon}_t^\prime, \tilde{\epsilon}_{it}^\prime\right)^\prime$ where $\bar{\epsilon}_t^\prime$ and $\tilde{\epsilon}_{it}^\prime$ are $M\times 1$ vectors of common and idiosyncratic white noise shocks, respectively. Let $\Lambda_i$ be and $M\times M$ diagonal matrix such that the diagonal elements are the loading coefficients $\lambda_{i,m}$, where $m=1,\ldots, M$. Then the composite white noise errors, \begin{equation} \epsilon_{it} = \Lambda_i \bar{\epsilon}_t + \tilde{\epsilon}_{it} \end{equation} where $E\left[ \xi_{it}\xi_{it}^\prime \right] = \text{diag} \left\{ \Omega_{i, \bar{\epsilon}}, \Omega_{i, \tilde{\epsilon}} \right\}, \forall i,t$. Moreover, $E\left[\xi_{it}\right] = 0, \forall i,t$, $E\left[\xi_{is}\xi_{it}^\prime\right] = 0, \forall i,s\neq t$, and $E\left[\tilde{\epsilon}_{it}\tilde{\epsilon}_{it}^\prime\right] = 0, \forall i\neq j, t$.</br></br> <h3>Relationships between reduced forms and structural forms</h3> \begin{align*} &\text{Shocks:} \quad \mu_{it} = A_i(0)\epsilon_{it}\\ &\text{Responses:} \quad F_{i}(L)A_i(0) = A_i(L)\\ &\text{Steady states:} \quad F_{i}(1)A_i(0) = A_i(1) \end{align*} where $\mu_{it}$ is the reduced form residuals ($R_i(L) \Delta z_{it} = \mu_{it}$), $F_i(L) = R_i(L)^{-1}$, and $\epsilon_{it}$ are the structural shocks ($\delta z_{it} = A_i(L)\epsilon_{it}$).</br></br> <h3>Typical structural identifying restrictions on dynamics</h3> \begin{align*} &A(0) \text{ decompositions:} \quad \Omega_{\mu,i} = A_i(0)A_i(0)^\prime\\ &\text{Short-run restrictions:} \quad \Omega_{\mu,i} = B_i(0)^{-1}B_i(0)^{-1^\prime}\\ &\text{Long-run restrictions:} \quad \Omega_{\mu,i}(1) = A_i(1)A_i(1)^\prime \end{align*} The adding-up of constraints with re-normalization implies that equation (1) can be rewritten as $$\epsilon_{it} = \Lambda_i \bar{\epsilon}_{it} + (I - \Lambda_i\Lambda_i^\prime)^{1/2} \tilde{\epsilon}_{it}^\star$$ Finally, we can use this re-scaled form to decompose the impulse responses into the common and idiosyncratic shocks as: $$A_i(L) = \bar{A}_i(L) + \tilde{A}_i(L)$$ where $\bar{A}_i(L)$ is the member specific response to the common shocks ($\bar{A}_i(L) = A_i(L)\Lambda_i$), and $\tilde{A}_i(L)$ is the member specific response to the idiosyncratic shocks ($\tilde{A}_i(L) = A_i(L)(I - \Lambda_i\Lambda_i^\prime)^{1/2}$) such that the two responses sum to the total member specific response to the composite shocks.</br></br> The following is a summary of the estimation algorithm for an unbalanced panel $\Delta z_{i,t}$ with dimensions $i = 1, \ldots, N$(member), $t=1, \ldots T_i$(time), and $m=1, \ldots, M)$(variable): <ol><li> Compute the time effects, $\Delta \bar{z}_t = N_t^{-1}\sum_{i=1}^{N_t}\Delta z_{it}$ and use these along with $\Delta z_{it}$ to estimate the reduced form VARs, $\bar{R}(L)\Delta \bar{z}_t = \bar{mu}_t$ and $R_i(L)\Delta z_{it} = \mu_{it}$ for each member $i$, using an information criterion to fit an appropriate member specific lag truncation, $P_i$. <li> Use appropriate identifying restrictions such as short-run (Cholesky) or long-run (BQ) identification method to obtain structural shock estimates for $\epsilon_{it}$(composite) and $\bar{\epsilon}_{t}$(common). <li> Compute diagonal elements of the loading matrix, $\Lambda_i$, as correlations between $\epsilon_{it}$ and $\bar{\epsilon}_t$ for each member, $i$, and compute idiosyncratic shock, $\tilde{\epsilon}_{it}$, using equation $\epsilon_{it} = \Lambda_i \bar{\epsilon}_t + \tilde{\epsilon}_{it}$. <li> Compute member-specific impulse responses to unit shocks: $A_i(L) = \bar{A}_i(L) + \tilde{A}_i(L)$, where $\bar{A}_i(L) = A_i(L)\Lambda_i$ and $\tilde{A}_i(L) = A_i(L)(I - \Lambda_i\Lambda_i^\prime)^{1/2}$ <li> Use sample distribution of estimated $A_i(L), \bar{A}_i(L)$, and $\tilde{A}_i(L)$ responses to describe properties of the confidence interval quantiles. </ol></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;">Now we turn to the implementation of the psvar add-in. First, we need to open the data file named as pedroni_ppp.wf1 which is located in the installation folder.&nbsp;</span></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">wfopen pedroni_ppp.wf1</span></div><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-hO0FJvpWZJQ/XA_mbh5p5cI/AAAAAAAAAqg/TK6gL9BtsNkOAgvq8dUeBoinzN1yQaedgCLcBGAs/s1600/workfile.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="455" data-original-width="506" height="358" src="https://3.bp.blogspot.com/-hO0FJvpWZJQ/XA_mbh5p5cI/AAAAAAAAAqg/TK6gL9BtsNkOAgvq8dUeBoinzN1yQaedgCLcBGAs/s400/workfile.png" width="400" /></a></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;"><br /></span></div><div><div style="font-family: Verdana, sans-serif;">For testing purpose, we use this panel data. The sample size for the data is 4920 (1973m06 to 1993m11 x 20)&nbsp;&nbsp;</div><div style="font-family: Verdana, sans-serif;"><br /></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;">Next, we generate variable, </span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">ereal</span><span style="font-family: &quot;verdana&quot; , sans-serif;">, and take the logarithm of series </span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">ereal</span><span style="font-family: &quot;verdana&quot; , sans-serif;">, </span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">cpi </span><span style="font-family: &quot;verdana&quot; , sans-serif;">and </span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">ae</span><span style="font-family: &quot;verdana&quot; , sans-serif;">. You don’t need take the first difference of variables. The add-in will do it for you.&nbsp;</span></div><div style="font-family: Verdana, sans-serif;"><br /></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">series ereal = ae*uscpi/cpi</span></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">series logereal = log(Ereal)&nbsp; &nbsp;</span></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">series logcpi = log(cpi)&nbsp; &nbsp; &nbsp;</span></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">series logae = log(ae)</span></div><div style="font-family: Verdana, sans-serif;"><br /></div><div style="font-family: Verdana, sans-serif;">Then we apply the psvar add-in to this panel data. We can do this either by command line or menu driven interface.&nbsp;</div><div style="font-family: Verdana, sans-serif;"><br /></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">psvar(ident=2, horizon=24) 18 @ logereal logcpi logae</span></div><div style="font-family: Verdana, sans-serif;"><br /></div><div style="font-family: Verdana, sans-serif;">or</div><div style="font-family: Verdana, sans-serif;"><br /></div><div><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">psvar(ident=2, horizon=24, ci=0.5, length=5, average=mean, sample=”1976m06 1993 m11”, save=1) 18 @ logereal logcpi logae</span></div><div style="font-family: Verdana, sans-serif;"><br /></div><div style="font-family: Verdana, sans-serif;">Please see the document for the detailed description of the command options. The resulting output will be three graph objects that contains 3x3 charts similar to those produced by EViews’ VAR object:&nbsp;</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://3.bp.blogspot.com/--q6h1zZvgv8/XA_rpx3FFZI/AAAAAAAAAq0/eVgC7eMqJigAuBfSsg-QeI0Hn0JZBr8KACLcBGAs/s1600/figure1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="460" data-original-width="720" height="408" src="https://3.bp.blogspot.com/--q6h1zZvgv8/XA_rpx3FFZI/AAAAAAAAAq0/eVgC7eMqJigAuBfSsg-QeI0Hn0JZBr8KACLcBGAs/s640/figure1.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 1: Response Estimates to Composite Shocks</td></tr></tbody></table><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://4.bp.blogspot.com/-ArIzG2vLfao/XA_rp7cP74I/AAAAAAAAAqw/RJGXL5vELvI5wbCiiKrzX_GKvisaPOvtACLcBGAs/s1600/figure2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="460" data-original-width="720" height="408" src="https://4.bp.blogspot.com/-ArIzG2vLfao/XA_rp7cP74I/AAAAAAAAAqw/RJGXL5vELvI5wbCiiKrzX_GKvisaPOvtACLcBGAs/s640/figure2.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2: Response Estimates to Common Shocks</td></tr></tbody></table><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-czeBa7xyEJM/XA_rpybBxwI/AAAAAAAAAqs/YyNvF7s-Cr80hn-Rf29TTDY4a30cEkxhwCLcBGAs/s1600/figure3.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="460" data-original-width="720" height="408" src="https://1.bp.blogspot.com/-czeBa7xyEJM/XA_rpybBxwI/AAAAAAAAAqs/YyNvF7s-Cr80hn-Rf29TTDY4a30cEkxhwCLcBGAs/s640/figure3.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 3: Response Estimates to Idiosyncratic Shocks</td></tr></tbody></table><div><span style="font-family: &quot;verdana&quot; , sans-serif;">Alternatively, you can implement the psvar add-in by the menu driven interface.</span></div></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-Trm7NzFukOc/XA_sDQqHRRI/AAAAAAAAArE/Gs03BzEgzQUE3CkVaifDh9DYb3Az_7a7wCLcBGAs/s1600/diag.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="384" data-original-width="451" height="340" src="https://3.bp.blogspot.com/-Trm7NzFukOc/XA_sDQqHRRI/AAAAAAAAArE/Gs03BzEgzQUE3CkVaifDh9DYb3Az_7a7wCLcBGAs/s400/diag.png" width="400" /></a></div><div><span style="font-family: &quot;verdana&quot; , sans-serif;"></span><br /><div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><div>The first box lets you specify the endogenous variable (logereal, logcpi, logae) for panel SVAR while the second box specify the number of maximum lags (18). Next you can select the shock identification of panel SVAR by the radio box. For example, here chooses the long-run identification. The identification scheme is nonsensical for this particular data and does not correspond to any existing study. For lag length criteria box, we choose GTOS (General to specific). The three main information criteria are the AIC, SBC(BIC)&nbsp; and HQ. However the default lag length criteria is GTOS according to Pedroni (2013)’s suggestion. Like the information criteria, this starts with a large number of lags, but rather than minimizing across all choices for p, it does a sequence of tests for p vs p-1. Lags are dropped as long as they test insignificant. Other boxes specify some optional and self-explanatory inputs.&nbsp;</div><div><br /></div></span></div>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com2tag:blogger.com,1999:blog-6883247404678549489.post-86579187588266862322018-12-04T17:18:00.000-08:002018-12-05T07:54:22.729-08:00Nowcasting GDP on a Daily Basis<i><span style="font-family: &quot;verdana&quot; , sans-serif;">Author and guest blog by Michael Anthonisz, Queensland Treasury Corporation.</span></i><br /><i><span style="font-family: &quot;verdana&quot; , sans-serif; font-size: x-small;">In this blog post, Michael demonstrates the use of MIDAS in EViews to nowcast Australian GDP growth on a daily basis.</span></i><br /><i><span style="font-family: &quot;verdana&quot; , sans-serif; font-size: x-small;"><br /></span></i><span style="font-family: &quot;verdana&quot; , sans-serif;">"Nowcasts" are forecasts of the here and now ("now" + "forecast" = "nowcast"). They are forecasts of the&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">present, the near future or the recent past. Specifically, nowcasts allow for real-time tracking or&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">forecasting of a lower frequency variable based on other series which are released at a similar or higher&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">frequency.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"></span><br /><a name='more'></a><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">For example, one could try to forecast the outcome for the current quarter GDP release using a&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">combination of daily, weekly, monthly and quarterly data. In this example, the nowcast could be updated&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">on a daily basis – the highest frequency of explanatory data – as new releases for the series being used to&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">explain GDP came in. That is, as the daily, weekly, monthly and quarterly data used to explain GDP is&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">released, the nowcast for current quarter GDP is updated in real-time on a daily basis.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">The ability to update one's forecast incrementally in real-time in response to incoming information is an&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">attractive feature of nowcasting models. Forecasting in this manner will lower the likelihood of one's&nbsp;</span><span style="font-family: &quot;verdana&quot; , sans-serif;">forecasts becoming "stale". Indeed, nowcasts have been found to be more accurate:</span><br /><br /><ul><li><span style="font-family: &quot;verdana&quot; , sans-serif;">at short-term horizons.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">as the period of interest (eg, the current quarter) goes on.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">than traditional forecasting approaches at these horizons.</span></li></ul><div><span style="font-family: &quot;verdana&quot; , sans-serif;">Other key findings in relation to nowcasts are that:</span></div><div><ul><li><span style="font-family: &quot;verdana&quot; , sans-serif;">they also perform similarly to private sector forecasters who are able to also incorporate information in real-time.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">there are mixed findings as to relative gains from including high frequency financial data.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">"soft data"<a href="#1" name="top1"><sup>1</sup></a> is most useful early on in the nowcasting cycle and "hard data"<a href="#2" name="top2"><sup>2</sup></a> is of more use later on.</span></li></ul><div><span style="font-family: &quot;verdana&quot; , sans-serif;">There are a number of approaches that can be used to prepare a nowcast including:</span></div></div><div><ul><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Bayesian vector autoregressions (for example, <a href="https://www.newyorkfed.org/medialibrary/media/research/staff_reports/sr830.pdf" target="_blank">Bok et al 2017</a>).</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Factor-augmented autoregressive models (for example, <a href="https://bank.gov.ua/doccatalog/document?id=62251312" target="_blank">Grui &amp; Lysenko, 2017</a>).</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Mixed Frequency VARs (for example, <a href="http://dept.ku.edu/~empirics/Courses/Econ844/papers/Nowcasting%20GDP.pdf" target="_blank">Giannone, Reichlin &amp; Small, 2008</a>).</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">MIDAS (Mixed Data Sampling) (for example,<a href="http://webspace.qmul.ac.uk/aferreira/jbes08.pdf" target="_blank"> Clements &amp; Galvao, 2007</a>).</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Accounting-based tracking models<a href="#3" name="top3"><sup>3</sup></a> (for example, <a href="https://www.frbatlanta.org/-/media/documents/research/publications/wp/2014/wp1407.pdf" target="_blank">Higgins, 2014</a><span id="goog_402796007"></span><a href="https://www.blogger.com/"></a><span id="goog_402796008"></span>).</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Bridge equations<a href="#4" name="top4"><sup>4</sup></a> (for example, <a href="https://www.ecb.europa.eu/pub/conferences/shared/pdf/20180618_forecasting/Paper_Ferrara_et_al.pdf" target="_blank">Ferrara &amp; Simoni, 2018</a>).</span></li></ul><div><span style="font-family: &quot;verdana&quot; , sans-serif;"></span><br /><div><span style="font-family: &quot;verdana&quot; , sans-serif;">Through its broad functionality EViews is able to facilitate the use of all of these approaches. For the purposes of this blog entry and in recognition of its availability from EViews 9.5 <a href="http://www.eviews.com/EViews9/ev95midas.html" target="_blank">onwards</a> as well as its ease of use, MIDAS regressions will be used to provide a daily nowcast of quarterly trend Australian real GDP growth<a href="#5" name="top5"><sup>5</sup></a>. MIDAS models are perfectly suited to handle the nowcasting problem, which at its essence, relates to how to use data for explanatory variables which are released at different frequencies to explain the dependent variable<a href="#6" name="top6"><sup>6</sup></a>.</span></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><div><br /></div><div><div>In this example, the series used in the MIDAS model to nowcast GDP are not just regular economic or financial time series, however. To capture as broad a variety of influences on the dependent variable as possible, as well as to ensure a parsimonious specification, principal components analysis ("PCA") is used<a href="#7" name="top7"><sup>7</sup></a>. This allows us to extract a common trend from a large number of series. Using this approach will enable us to cut down on "noise" and hopefully use more "signal" to estimate GDP.</div></div><div><br /></div><div><div>The data series used to derive these common factors are compiled on a monthly and quarterly basis and are released in advance of, during and following the completion of the current quarter of interest with respect to GDP. The common factors are calculated at the lowest frequency of the underlying data (quarterly) and are complemented in the model by daily financial data which may have some explanatory power over the quarterly change in Australian GDP (for example, the trade weighted exchange rate and the three-year sovereign bond yield).</div></div><div><br /></div><div><div>An outline of the steps required to do this sort of MIDAS-based nowcast is below. Keep in mind the helpful <a href="http://www.eviews.com/help/helpintro.html#page/content/midas-MIDAS_Estimation_in_EViews.html" target="_blank">point and click</a> as well as <a href="http://www.eviews.com/help/helpintro.html#page/content/commandcmd-midas.html" target="_blank">command language </a>instructions published by EViews which provide more detail.</div></div><div><ul><li>Create separate tabs in the workfie which correspond to the different frequencies of underlying data you are using.</li><li>Import the underlying data and normalize to be in Z Score form (that is, mean of zero and variance of one) <a href="https://www.researchgate.net/post/What_is_the_best_way_to_scale_parameters_before_running_a_Principal_Component_Analysis_PCA" target="_blank">before running the PCA</a>.</li><li>Have the common factors created from the PCA appear on the relevant tab in the workfile<a href="#8" name="top8"><sup>8</sup></a>.</li><li>Clean the data to get rid of any N/A values for data that has not yet been published.<a href="#9" name="top9"><sup>9</sup></a></li><li>Re-run the PCA to reflect that you now have data for the underlying series for the full sample period.</li></ul><div>It is important to note that the variable being nowcast must actually be forecast with the same periodicity as its release. In this instance, GDP is released quarterly so our forecasts of it will be quarterly as well. This means all the work at this stage of the estimation will be done on the quarterly page. We are aiming</div><div>to produce forecasts of a quarterly variable which are updated on a more real-time (that is, daily basis) but are not actually producing a forecast of daily GDP.</div><div><br /></div><div>An illustration of the rolling process might make this clearer. For instance:</div></div><div><ul><li>Let's imagine it is currently 1 July 2018.</li><li>We’re interested in forecasting Q3 2018 GDP using one period lags of GDP and the common factors estimated earlier via PCA. These are quarterly representations of conditions with respect to labour markets and capital investment as well as measures of current and future economic activity. We’ll also using bond yields and the trade-weighted exchange rate, both of which are available on a daily basis.</li><li>In our MIDAS model, quarterly GDP is the dependent variable and the aforementioned other variables are independent variables. The model is estimated using historical data from Q2 1993 until Q2 2018 (as it is 1 July we have data to 30 June).</li><li>As we want to forecast Q3, and have data on our daily variables until the end of Q2 2018, we can specify the equation as each quarter’s GDP growth is a function of the previous quarter’s outcomes for the quarterly variable and of (say) the last 45 days’ worth of values for bond yields and the exchange rate ending on the last day of the previous quarter.</li><li>Having estimated the model, we can use the 45 daily values for bond yields and the exchange rate from May to June 2018 to forecast Q3 GDP.</li><li>Now, assume the calendar has turned over and it is now 2 July 2018. We have one more observation for the daily series. We can update the forecast of GDP by estimating a new model on historical data that used 44 days from the previous quarter and the first day from the current quarter, and then forecast Q3 GDP.</li><li>Then, assume it is 3 July 2018. We can now update our forecast by estimating on 43 days of the previous quarter and the first 2 days from the current quarter. And so on.</li><li>We will end up with a forecast of quarterly GDP that is updated daily. That doesn't make it a forecast of daily GDP as it is a quarterly variable. We're just able to forecast it using current (now) data and update this forecast continuously on a daily basis.</li></ul><div><div>For our concrete example using Australian macroeconomic variables, we will estimate a MIDAS model where the dependent variable is the quarterly change in the trend measure of Australian real GDP.</div><div><br /></div><div>The independent variables of the model can be seen in Figure 1:</div></div></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://4.bp.blogspot.com/-aPxGtXdpZOM/XAV2Ny8xb_I/AAAAAAAAAos/x6HWyaECIqE-_1o8eUlDtRD3PPCLH32EACPcBGAYYCw/s1600/variables.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1482" data-original-width="1384" height="640" src="https://4.bp.blogspot.com/-aPxGtXdpZOM/XAV2Ny8xb_I/AAAAAAAAAos/x6HWyaECIqE-_1o8eUlDtRD3PPCLH32EACPcBGAYYCw/s640/variables.png" width="595" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 1: Independent variables used in MIDAS estimation (click to enlarge)</td></tr></tbody></table><div><div>All data are sourced from the Bloomberg and Thomson Reuters Datastream databases, accessible via EViews.</div><div><br /></div><div>The specific equation in EViews is estimated using the Equation object with the method set to MIDAS, and with variable names of:</div><div><ul><li>gdp_q_trend_3m_chg = quarterly change in the trend measure of Australian GDP.</li><li>gdp_q_trend_3m_chg(-1) = one quarter lag of the quarterly change in the trend measure of Australian GDP.</li><li>activity_current(-1) = one quarter lag of a PCA derived factor representing current economic activity in Australia.</li><li>activity_leading(-1) = one quarter lag of a PCA derived factor representing future economic activity in Australia.</li><li>investment(-1) = one quarter lag of a PCA derived factor representing capital investment in Australia.</li><li>labour_market(-1) = one quarter lag of a PCA derived factor representing labour market conditions in Australia.</li><li>au_midas_daily\atwi_final(-1) = the lag of the trade-weighted Australia Dollar where this data is located on a page with a daily frequency.</li><li>au_midas_daily\gacgb3_final(-1) = the lag of the three-year Australian sovereign bond yield where this data is located on a page with a daily frequency.</li></ul><div><div>In this example we will estimate the dependent variable using historical data from Q2 1993 until Q2 2018. From this we can then do forecasts for the current quarter (in this case Q3 2018) whereby the dependent variable is a function of the previous quarter’s outcomes for the quarterly independent variables and of the last 45 days’ worth of values for bond yields and the exchange rate. The MIDAS equation estimation window that reflects this would be as follows:</div></div></div></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://4.bp.blogspot.com/-Hf45IdrNSyk/XAV7DRHdxqI/AAAAAAAAApE/1nx9yXWAnmslpNpAWk6RB0uftBaqpANTgCPcBGAYYCw/s1600/EstDlg.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="459" data-original-width="460" height="397" src="https://4.bp.blogspot.com/-Hf45IdrNSyk/XAV7DRHdxqI/AAAAAAAAApE/1nx9yXWAnmslpNpAWk6RB0uftBaqpANTgCPcBGAYYCw/s400/EstDlg.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2: Estimation specification (click to enlarge)</td></tr></tbody></table><div><br /></div><div>Running the MIDAS model results in the following estimation output:</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://2.bp.blogspot.com/-ySkBQDo7WJs/XAV-AR2CezI/AAAAAAAAApc/BjSvBk2wGO0rAIN01VbNH5fSPad0yL9EQCPcBGAYYCw/s1600/EstOut.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="692" data-original-width="540" height="640" src="https://2.bp.blogspot.com/-ySkBQDo7WJs/XAV-AR2CezI/AAAAAAAAApc/BjSvBk2wGO0rAIN01VbNH5fSPad0yL9EQCPcBGAYYCw/s640/EstOut.png" width="497" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 3: Estimation output (click to enlarge)</td></tr></tbody></table><div><div>This individual estimation gives us a single forecast for GDP based upon the most current data available. Specifically, this estimation uses data up to:</div><div><ul><li>2018Q2 for our dependent variable.</li><li>2018Q1 for our quarterly independent variables (since they are all lagged one period).</li><li>May 30th for our daily independent variables (a one day lag from the last day of Q2). Also note that since we are using 45 daily periods for each quarter, the 2018Q2 data point is estimated using data from March 29th - May 30th (we are dealing with regular 5-day data).</li></ul></div><div>From this equation we can then produce a forecast of the 2018Q3 value of GDP by clicking on the Forecast button:</div></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-PmC6u0FmJdw/XAajCaYzEoI/AAAAAAAAAp0/4j-mI9JB6Fk4MDRV88JgQZqn39DlNp9lgCPcBGAYYCw/s1600/ForcDlg.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="601" data-original-width="630" height="381" src="https://1.bp.blogspot.com/-PmC6u0FmJdw/XAajCaYzEoI/AAAAAAAAAp0/4j-mI9JB6Fk4MDRV88JgQZqn39DlNp9lgCPcBGAYYCw/s400/ForcDlg.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 4: Forecast dialog (click to enlarge)</td></tr></tbody></table><div><div>This single quarter forecast uses data from:</div><div><ul><li>2018Q2 for our quarterly independent variables (since they are all lagged one period).</li><li>July 30th 2018 - September 28th 2018 for our daily independent variables (45 days ending on the last day of Q3 2018 - September 29th/30th are a weekend, so not included in our workfile).</li></ul></div><div>To produce an updated forecast the following day, we could re-estimate our equation using the same data, but with the daily independent variables shifted forwards one day (removing the one day lag on their specification), and then re-forecasting.</div><div><br /></div><div>Or, if we wanted an historical view on how our forecasts would have performed previously, we can re-estimate for the previous day (shifting our daily variables back by one day by increasing their lag to 2) and then re-forecast.</div><div><br /></div><div>Indeed we could repeat the historical procedure going back each day for a number of years, giving us a series of daily updated forecast values. Performing this action manually is a little cumbersome, but an EViews program can make the task simple. A rough example of such a program may be downloaded <a href="http://www.eviews.com/blog/AusMIDAS/midasprg.prg">here</a>.</div><div><br /></div><div>Once the series of daily forecasts is created, you can produce a good picture of the accuracy of this procedure:</div></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://2.bp.blogspot.com/-dPpqt9h4axc/XAavEyJkOQI/AAAAAAAAAqM/6Nkv-cBXctQnfKyVei8wvq4It8H7KG-eQCPcBGAYYCw/s1600/ForcGraph.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="676" data-original-width="993" height="434" src="https://2.bp.blogspot.com/-dPpqt9h4axc/XAavEyJkOQI/AAAAAAAAAqM/6Nkv-cBXctQnfKyVei8wvq4It8H7KG-eQCPcBGAYYCw/s640/ForcGraph.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 5: Daily updated forecast of Australian GDP Trend (click to expand)</td></tr></tbody></table><div><br /></div></span></div></div> <hr width="80%"><p><span class="Apple-style-span" style="font-size: x-small;"><br /><a name="1"><b>1 </b></a>Such as consumer or business surveys<a href="#top1"><sup>↩</sup></a><br /><a name="2"><b>2 </b></a>Such a retail spending, housing or labour market data<a href="#top2"><sup>↩</sup></a><br /><a name="3"><b>3 </b></a>As GDP, for example, is essentially an accounting identity that represents the sum of different income, expenditure or production measures, it can be calculated using a ‘bottom-up’ approach in which series that proxy for the various components of GDP are used to construct an estimate of it using an accounting type approach.<a href="#top3"><sup>↩</sup></a><br /><a name="4"><b>4 </b></a>Bridge equations are regressions which relate low frequency variables (e.g. quarterly GDP) to higher frequency variables (eg, the unemployment rate) where the higher frequency observations are aggregated to the quarterly frequency. It is often the case that some but not all of the higher frequency variables are available at the end of the quarter of interest. Therefore, the monthly variables which aren’t as yet available are forecasted using auxiliary models (eg, ARIMA). <a href="#top4"><sup>↩</sup></a><br /><a name="5"><b>5 </b></a>Papers using a daily frequency in mixed frequency regression analyses include <a href="https://www.dept.aueb.gr/sites/default/files/Kourtellos24-5-12.pdf">Andreou, Ghsels & Kourtellos, 2010</a>, <a href="https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1948&context=soe_research">Tay, 2006</a> and <a href="https://onlinelibrary.wiley.com/doi/pdf/10.1111/1475-4932.12181">Sheen, Truck & Wang, 2015.</a><a href="#top5"><sup>↩</sup></a><br /><a name="6"><b>6 </b></a>MIDAS models use distributed lags of explanatory variables which are sampled at an equivalent or higher frequency to the dependent variable. A distributed lag polynomial is used to ensure a parsimonious specification. There are different types of lag polynomial structures available in EViews. <a href="http://uu.diva-portal.org/smash/get/diva2:783891/FULLTEXT01.pdf">Lindgren & Nilson, 2015</a> discuss the forecasting performance of the different polynomial lag structures.<a href="#top6"><sup>↩</sup></a><br /><a name="7"><b>7 </b></a>See <a href="https://sites.google.com/site/econometricsacademy/econometrics-models/principal-component-analysis">here</a> and <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">here</a> for background and <a href="http://www.eviews.com/help/helpintro.html#page/content/groups-Principal_Components.html">here</a> and <a href="http://blog.eviews.com/2018/11/principal-component-analysis-part-ii.html">here</a> for how to do in EViews.<a href="#top7"><sup>↩</sup></a><br /><a name="8"><b>8 </b></a>For example, underlying data on a monthly and quarterly basis will generate a common factor that is on a quarterly basis. This should therefore go on a quarterly workfile tab.<a href="#top8"><sup>↩</sup></a><br /><a name="9"><b>9 </b></a>For example, if there was an NA then you could choose to use the previous value for the latest date instead. For example, X_full series = @recode(X =na, X(-1), X)<a href="#top9"><sup>↩</sup></a><br /></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com16tag:blogger.com,1999:blog-6883247404678549489.post-4335981619631134492018-11-26T14:35:00.000-08:002018-11-28T10:11:15.117-08:00Principal Component Analysis: Part II (Practice)<script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; , sans-serif;"> In <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of our series on <b>Principal Component Analysis</b> (PCA), we covered a theoretical overview of fundamental concepts and disucssed several inferential procedures. Here, we aim to complement our theoretical exposition with a step-by-step practical implementation using EViews. In particular, we are motivated by a desire to apply PCA to some dataset in order to identify its most important features and draw any inferential conclusions that may exist. We will proceed in the following steps:<a name='more'></a> <ol> <li> Summarize and describe the dataset under consideration. <li> Extract all principal (important) directions (features). <li> Quantify how much variation (information) is explained by each principal direction. <li> Determine how much variation each variable contributes in each principal direction. <li> Reduce data dimensionality. <li> Identify which variables are correlated and which correlations are more principal. <li> Identify which observations are correlated with which variables. </ol> The links to the workfile and program file can be found at the end.</br></br> <h3>Principal Component Analysis of US Crime Data</h3> We will use PCA to study US crime data. In particular, our dataset summarizes the number of arrests per 100,000 residents in each of the 50 US states in 1973. The data contains four variables, three of which pertain to arrests associated with (and naturally named) <b>MURDER</b>, <b>ASSAULT</b>, and <b>RAPE</b>, whereas the last, named <b>URBANPOP</b>, contains the percentage of the population living in urban centers.</br></br> <h4>Data Summary</h4> To understand our data, we will first create a <b>group</b> object with the variables of interest. We can do this by selecting all four variables in the workfile by clicking on each while holding down the <b>Ctrl</b> button, right-clicking on any of the highlighted variables, moving the mouse pointer over <b>Open</b> in the context menu, and finally clicking on <b>as Group</b>. This will open a group object in a spreadsheet with the four variables placed in columns. The steps are reproduced in Figures 1a and 1b.</br></br> <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 1A :::::::::: --> <center> <a href="https://2.bp.blogspot.com/-vLsLi-3fz_Y/W_wusCZp91I/AAAAAAAAAmQ/cEiij6CUgYQ1xGLEQcW1xL6E5nXz8nmpgCLcBGAs/s1600/pcademo1.jpg"><img src="https://2.bp.blogspot.com/-vLsLi-3fz_Y/W_wusCZp91I/AAAAAAAAAmQ/cEiij6CUgYQ1xGLEQcW1xL6E5nXz8nmpgCLcBGAs/s1600/pcademo1.jpg" title="Open Group" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 1A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 1B :::::::::: --> <center> <a href="https://2.bp.blogspot.com/-rGbiwuah8PI/W_wuujGRY_I/AAAAAAAAAmw/MGBP75MEEpg0ORZw0zE9nYfygv71Lt-xwCLcBGAs/s1600/pcademo2.jpg"><img src="https://2.bp.blogspot.com/-rGbiwuah8PI/W_wuujGRY_I/AAAAAAAAAmw/MGBP75MEEpg0ORZw0zE9nYfygv71Lt-xwCLcBGAs/s1600/pcademo2.jpg" title="Group Window" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 1B :::::::::: --> </td> </tr> <tr> <td><center><small>Figure 1A: Open Group</small><br /><br /></center></td> <td><center><small>Figure 1B: Group Window</small><br /><br /></center></td> <br /><br /> </tr> </tbody> </table> From here, we can derive the usual summary statistics by clicking on <b>View</b> in the group window, moving the mouse over <b>Descriptive Stats</b> and clicking on <b>Common Sample</b>. This produces a spreadsheet with various statistics of interest. We reproduce the steps and output in Figures 2a and 2b.</br></br> <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 2A :::::::::: --> <center> <a href="https://4.bp.blogspot.com/-12NTMMcAqAs/W_wuvmtfLeI/AAAAAAAAAns/OcHBLa3PhxYjbbp4nwnSM0WxtYgtInXcgCPcBGAYYCw/s1600/pcademo3.jpg"><img src="https://4.bp.blogspot.com/-12NTMMcAqAs/W_wuvmtfLeI/AAAAAAAAAns/OcHBLa3PhxYjbbp4nwnSM0WxtYgtInXcgCPcBGAYYCw/s1600/pcademo3.jpg" title="Descriptive Stats Menu" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 2A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 2B :::::::::: --> <center> <a href="https://4.bp.blogspot.com/-fbnQuF4naTA/W_wuv3uOCTI/AAAAAAAAAno/bQasdqH9EbwEAibaSxuk_-yaynSqfE-EwCPcBGAYYCw/s1600/pcademo4.jpg"><img src="https://4.bp.blogspot.com/-fbnQuF4naTA/W_wuv3uOCTI/AAAAAAAAAno/bQasdqH9EbwEAibaSxuk_-yaynSqfE-EwCPcBGAYYCw/s1600/pcademo4.jpg" title="Descriptive Stats Output" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 2B :::::::::: --> </td> </tr> <tr> <td><center><small>Figure 2A: Descriptive Stats Menu</small><br /><br /></center></td> <td><center><small>Figure 2B: Descriptive Stats Output</small><br /><br /></center></td> <br /><br /> </tr> </tbody> </table> We can also plot each of the series to get a better visual sense for the data. In particular, from the group window, click on <b>View</b> and click on <b>Graph</b>. This brings up the <b>Graph Options</b> window. Here, from the <b>Multiple Series</b> dropdown menu, select <b>Multiple Graphs</b> and click on <b>OK</b>. We summarize the sequence in Figures 3a and 3b.</br></br> <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 3A :::::::::: --> <center> <a href="https://2.bp.blogspot.com/-UgCZXK2VOMQ/W_wuvuaCKDI/AAAAAAAAAnw/N6f03xO84FwC7XWzYsufGF8HH7OKsKaNACPcBGAYYCw/s1600/pcademo5.jpg"><img src="https://2.bp.blogspot.com/-UgCZXK2VOMQ/W_wuvuaCKDI/AAAAAAAAAnw/N6f03xO84FwC7XWzYsufGF8HH7OKsKaNACPcBGAYYCw/s1600/pcademo5.jpg" title="Graph Options" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 3A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 3B :::::::::: --> <center> <a href="https://2.bp.blogspot.com/-vmjw95PwcJo/W_w_-fncmGI/AAAAAAAAAoU/zKznvyjfHxUGPXgI87XOKFW9h7H1ZwqugCPcBGAYYCw/s1600/pcademo6.jpg"><img src="https://2.bp.blogspot.com/-vmjw95PwcJo/W_w_-fncmGI/AAAAAAAAAoU/zKznvyjfHxUGPXgI87XOKFW9h7H1ZwqugCPcBGAYYCw/s1600/pcademo6.jpg" title="Multiple Graphs" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 3B :::::::::: --> </td> </tr> <tr> <td><center><small>Figure 3A: Graph Options</small><br /><br /></center></td> <td><center><small>Figure 3B: Multiple Graphs</small><br /><br /></center></td> <br /><br /> </tr> </tbody> </table> At last, we can get a sense for information redundancy (see section <i>Variance Decomposition</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series) by studying correlation patterns. In this regard, we can produce a correlation matrix by clicking on <b>View</b> in the group window and clicking on <b>Covariance Analysis...</b>. This opens a window with further options. Here, deselect (click) the checkbox next to <b>Covariance</b> and select (click) the box next to <b>Correlation</b>. This ensures that EViews will only produce the correlation matrix without any other statistics. Furthermore, in the <b>Layout</b> dropbox, select <b>Single table</b>, and finally click on <b>OK</b>. Figures 4a and 4b reproduce these steps.</br></br> <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 4A :::::::::: --> <center> <a href="https://3.bp.blogspot.com/-nWgmG7204lA/W_wuv_6KYmI/AAAAAAAAAnk/82lJnwReVzwSdgThQiUGHy_06ioWcLA2QCPcBGAYYCw/s1600/pcademo7.jpg"><img src="https://3.bp.blogspot.com/-nWgmG7204lA/W_wuv_6KYmI/AAAAAAAAAnk/82lJnwReVzwSdgThQiUGHy_06ioWcLA2QCPcBGAYYCw/s1600/pcademo7.jpg" title="Covariance Analysis" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 4A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 4B :::::::::: --> <center> <a href="https://4.bp.blogspot.com/-FUOB1S3ayG8/W_wuwVlKrUI/AAAAAAAAAn8/Rc2r51LaMDIOpU6fXUp9pX8e-QsPhRHXACPcBGAYYCw/s1600/pcademo8.jpg"><img src="https://4.bp.blogspot.com/-FUOB1S3ayG8/W_wuwVlKrUI/AAAAAAAAAn8/Rc2r51LaMDIOpU6fXUp9pX8e-QsPhRHXACPcBGAYYCw/s1600/pcademo8.jpg" title="Correlation Table" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 4B :::::::::: --> </td> </tr> <tr> <td><center><small>Figure 4A: Covariance Analysis</small><br /><br /></center></td> <td><center><small>Figure 4B: Correlation Table</small><br /><br /></center></td> <br /><br /> </tr> </tbody> </table> A quick interpretation of the correlation structure indicates that murder is highly correlated with assault, whereas the latter exhibits a strong positive correlation with rape. Moreover, whereas murder is nearly uncorrelated with larger urban centers, among the three causes for arrest, rape generally favours larger communities. Intuitively, this is in line with conventional wisdom. Murders are rarely observed on professional levels and typically involve assault as a precursor. Furthermore, due to higher costs of crime visibility and cleanup, murder generally does not favour larger population areas where police presence and witness visibility is generally more pronounced. On the other hand, rape favours larger urban centers due to the fact that there are simply more people and the cost to covering or denying the crime is notoriously very low. Furthermore, victims of rape in smaller communities are typically shamed into staying quiet since connection circles are naturally tighter in such surroundings.</br></br> <h3>Principal Component Analysis of Crime Data</h3> Doing PCA in EViews is trivial. From our group object window, click on <b>View</b> and click on <b>Principal Components...</b>. This opens the main PCA dialog. See Figure 5a and 5b below.</br></br> <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 5A :::::::::: --> <center> <a href="https://4.bp.blogspot.com/-sPzh2S3skJM/W_wusI_ZpxI/AAAAAAAAAn8/9v0jUSNfYkUykUDHdg-wDZy3UJoDvd01wCPcBGAYYCw/s1600/pcademo12.jpg"><img src="https://4.bp.blogspot.com/-sPzh2S3skJM/W_wusI_ZpxI/AAAAAAAAAn8/9v0jUSNfYkUykUDHdg-wDZy3UJoDvd01wCPcBGAYYCw/s1600/pcademo12.jpg" title="Initiating the PCA dialog" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 5A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 5B :::::::::: --> <center> <a href="https://1.bp.blogspot.com/-0uK7YlGUkJ0/W_wuwfUubDI/AAAAAAAAAn4/SImjRV0TleUhGnXKo3fXbR5Sr2g0Kf3EwCPcBGAYYCw/s1600/pcademo9.jpg"><img src="https://1.bp.blogspot.com/-0uK7YlGUkJ0/W_wuwfUubDI/AAAAAAAAAn4/SImjRV0TleUhGnXKo3fXbR5Sr2g0Kf3EwCPcBGAYYCw/s1600/pcademo9.jpg" title="Main PCA dialog" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 5B :::::::::: --> </td> </tr> <tr> <td><center><small>Figure 5A: Initiating the PCA dialog</small><br /><br /></center></td> <td><center><small>Figure 5B: Main PCA Dialog</small><br /><br /></center></td> <br /><br /> </tr> </tbody> </table> From here, EViews offers users the ability to apply several tools and protocols readily encountered in the literature on PCA.</br></br> <h4>Summary of Fundamentals</h4> As a first step, we are interested in summarizing PCA fundamentals. In particular, we seek an overview of eigenvalues and eigenvectors that result from applying the principal component decomposition to the covariance or correlation matrix associated with our variables of interest. To do so, consider the <b>Display</b> group, and select <b>Table</b>. The latter produces three tables summarizing the covariance (correlation) matrix, and the associated eigenvectors and eigenvalues.</br></br> Associated to this output are several important options under the <b>Component selection</b> group. These include: <ul> <li> <b>Maximum number</b>: This defaults to the theoretical maximum number of eigenvalues possible, which is the total number of variables in the group under consideration. In our case, this number is 4. <li> <b>Minimum eigenvalue</b>: This defaults to 0. Nevertheless, selecting a positive value requests that all eigenvectors associated with eigenvalues less than this value are not displayed. <li> <b>Cumulative proportion</b>: This defaults to 1. Choosing a value $\alpha < 1$ however, requests that only the most principal $k$ eigenvalues and eigenvectors associated with explaining $\alpha*100 \%$ of the variation are retained. Naturally, choosing $\alpha=1$ requests that all eigenvalues are displayed. See section <i>Dimension Reduction</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series for further details. </ul> Since we are interested in a global summary, we will leave the <b>Component selection</b> options at their default values.</br></br> Furthermore, consider momentarily the <b>Calculation</b> tab. Here, the <b>Type</b> dropdown offers the choice to apply the principal component decomposition either to the correlation or covariance matrix. For details, see sections <i>Variance Decomposition</i> and <i>Change of Basis</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series. The choice essentially reduces to whether or not the variables under consideration exhibit similar scales. In other words, if variances of the underlying variables of interest are similar, then conducting PCA on the covariance matrix is certainly justified. Nevertheless, if the variances are widely different, then selecting the correlation matrix is more appropriate if interpretability and comparability are desired. EViews errs on the side of caution and defaults to using the correlation matrix. Since the table of summary statistics we produced in figure 3b clearly shows a lack of uniformity in standard deviations across the four variables of interest, we will stick with the default and use the correlation matrix. Hit <b>OK</b>.</br></br> <!-- :::::::::: FIGURE 6 :::::::::: --> <center> <a href="https://3.bp.blogspot.com/-Y8HLv3vXMNU/W_wusIBKn9I/AAAAAAAAAn0/rAFDt602C6YxX3iYKMEeTVo0ScIZmli4ACPcBGAYYCw/s1600/pcademo13.jpg"><img src="https://3.bp.blogspot.com/-Y8HLv3vXMNU/W_wusIBKn9I/AAAAAAAAAn0/rAFDt602C6YxX3iYKMEeTVo0ScIZmli4ACPcBGAYYCw/s1600/pcademo13.jpg" title="PCA Table Output" width="320" height="auto" /></a><br /><br /> <small>Figure 6: PCA Table Output</small><br /><br /> </center> <!-- :::::::::: FIGURE 6 :::::::::: --> The resulting output, which is summarized Figure 6 above, consists of three tables. The first table summarizes the information on eigenvalues. The latter are sorted in order of principality (importance), measured as the proportion of information explained by each principal direction. Refer to section <i>Principal Directions</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series for more details. In particular, we see that the first principal direction explains roughly 62% of the information contained in the underlying correlation matrix, the second, roughly 25%, and so on. Furthermore, the cumulative proportion of information explained by the first two principal directions is roughly 87(62 + 25)%. In other words, if dimensionality reduction is desired, our analysis indicates that we can half the underlying dimensionality of the problem from 4 to 2, while retaining nearly 90% of the original information. This is evidently a profitable trade-off. For theoretical details, see section <i>Dimension Reduction</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series. At last, observe that EViews reports that the average of the 4 eigenvalues is 1. This will in fact always be the case when extracting eigenvalues from a correlation matrix.</br></br> The second (middle) table summarizes the eigenvectors associated with each of the principal eigenvalues. Naturally, the eigenvectors are also arranged in order of principality. Furthermore, whereas the eigenvalues highlight how much of the overall information is extracted in each principal direction, the eigenvectors reveal how much weight each variable has in each direction.</br></br> Recall from <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series that all eigenvectors have length unity. Accordingly, the relative importance of any variable in a given principal direction is effectively the proportion of the eigenvector length (unity) attributed to that variable. For instance, in the case of the first eigenvector, $[0.535899, 0.583184, 0.543432, 0.278191]^{\top}$, <b>MURDER</b> accounts for $0.535899^{2} \times 100\% = 0.287188\%$ of the overall direction length. Similarly, <b>ASSAULT</b> accounts for 0.340103% of the direction, and <b>RAPE</b> contributes 0.295318%. Evidently, the least important variable in the first principal direction is <b>URBANPOP</b>, which accounts for only 0.077390% of the direction length.</br></br> On the other hand, in the second principal direction, it is <b>URBANPOP</b> that carries most weight, contributing $0.872806 \times 100\% = 0.761790\%$ to the direction length. Accordingly, if feature extraction is the goal, it is clear (and rather obvious) that the first principal direction is roughly equally dominated by <b>MURDER</b>, <b>ASSAULT</b>, and <b>RAPE</b>, whereas the second principal direction is almost entirely governed by <b>URBANPOP</b>. For a theoretical exposition, see section <i>Principal Components</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series.</br></br> At last, the third table is just the correlation matrix to which the eigen-decomposition is applied. The latter, while important, is provided only as a reference.</br></br> <h4>Eigenvalue Plots and Dimensionality</h4> Now that we have a rough picture of PCA fundamentals associated with our dataset, it is natural to ask whether we can proceed with dimensionality reduction in a more formal manner. One such way (albeit arbitrary, but widely popular) is to look at several eigenvalue plots and visually identify how many eigenvalues to retain.</br></br> From the previous PCA output, click again on <b>View</b>, then <b>Principal Components...</b>, and select <b>Eigenvalue Plots</b> under the <b>Display</b> group. This is summarized in Figure 7 below.</br></br> <!-- :::::::::: FIGURE 7 :::::::::: --> <center> <a href="https://3.bp.blogspot.com/-ehsCXXsh38E/W_wusvmkQGI/AAAAAAAAAns/xxGhVsV3uKoHyO9kejwn2TSh9UtWTJZwQCPcBGAYYCw/s1600/pcademo14.jpg"><img src="https://3.bp.blogspot.com/-ehsCXXsh38E/W_wusvmkQGI/AAAAAAAAAns/xxGhVsV3uKoHyO9kejwn2TSh9UtWTJZwQCPcBGAYYCw/s1600/pcademo14.jpg" title="PCA Dialog: Eigenvalue Plots" width="320" height="auto" /></a><br /><br /> <small>Figure 7: PCA Dialog: Eigenvalue Plots</small><br /><br /> </center> <!-- :::::::::: FIGURE 7 :::::::::: --> Here, EViews offers several graphical representations for the underlying eigenvalues. The latter includes the scree plot, the differences between successive eigenvalues plot, as well as the cumulative proportion of information associated with the first $k$ eigenvalues plot. Go ahead and select all three. As before, we will leave the default values under the <b>Component Selection</b> group. Hit <b>OK</b>. Figure 8 summarizes the output.</br></br> <!-- :::::::::: FIGURE 8 :::::::::: --> <center> <a href="https://3.bp.blogspot.com/-rT3soZNWiPQ/W_wuspsZLcI/AAAAAAAAAnw/yQlwxrPW9jMaCpMpz8GP3aU5lUprK2gcQCPcBGAYYCw/s1600/pcademo15.jpg"><img src="https://3.bp.blogspot.com/-rT3soZNWiPQ/W_wuspsZLcI/AAAAAAAAAnw/yQlwxrPW9jMaCpMpz8GP3aU5lUprK2gcQCPcBGAYYCw/s1600/pcademo15.jpg" title="Eigenvalue Plots Output" width="320" height="auto" /></a><br /><br /> <small>Figure 8: Eigenvalue Plots Output</small><br /><br /> </center> <!-- :::::::::: FIGURE 8 :::::::::: --> EViews now produces three graphs. The first is the scree plot - a line graph of eigenvalues arranged in order of principality. Superimposed on this graph is a red dotted horizontal line with a value equal to the average of the eigenvalues, which, as we mentioned earlier, in our case is 1. The idea here is to look for a kink point, or an elbow, and retain all eigenvalues, and by extension their associated eigenvectors, that form the first portion of the kink, and discard the rest. From the plot, it is evident that a kink occurs at the 2nd eigenvalue, indicating that we should retain the first two eigenvalues.</br></br> A slightly more numeric approach discards all eigenvalues significantly below the eigenvalue average. Referring to the first table in Figure 6, we see that the average of the eigenvalues is 1, and the 2nd eigenvalue is in fact just below this cutoff. Since the 2nd value is so close to this average, while using the visual support we mentioned in the previous paragraph, it is safe to conclude that the scree plot analysis indicates that only the first two eigenvalues ought to be retained.</br></br> The second graph plots a line graph of the differences between successive eigenvalues. Superimposed on this graph is another horizontal line, this time with a value equal to the average of the differences of successive eigenvalues. Although EViews does not report this number, using the top table in Figure 6, it is not difficult to show that the average in question is $(1.490476+0.633202+0.183133)/3 = 0.768937$. The idea here is to retain all eigenvalues whose differences are above this threshold. Clearly, only the first two eigenvalues satisfy this criterion.</br></br> The final graph is a line graph of the cumulative proportion of information explained by successive principal eigenvalues. Superimposed on this graph is a line with a slope equal to the average of the eigenvalues, namely 1. The idea here is to retain those eigenvalues that form segments of the cumulative curve whose slopes are at least as steep as the line with slope 1. In our case, only two eigenvalues seem to form such a segment: eigenvalues 1 and 2.</br></br> All three graphical approaches indicate that one ought to retain the first two eigenvalues and their associated eigenvectors. There is however an entirely data driven methodology adapted from Bai and Ng (2002). We discussed this approach in section <i>Dimension Reduction</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series. Nevertheless, EViews currently doesn't support its implementation via dialogs and it must be programmed manually. In this regard, we temporarily move away from our dialog-based exposition, and offer a code snippet which implements the aforementioned protocol.</br></br> <PRE><br /> ' --- Bai and Ng (2002) Protocol ---<br /> group crime murder assault rape urbanpop ' create group with all 4 variables<br /> !obz = murder.@obs ' get number of observations<br /> !numvar = @columns(crime) ' get number of variables<br /> equation eqjr ' equation object to hold regression<br /> matrix(!numvar, !numvar) SSRjr' matrix to store SSR from each regression eqjr<br /><br /> crime.makepcomp(cov=corr) s1 s2 s3 s4 ' get all score series<br /><br /> for !j = 1 to !numvar<br /> for !r = 1 to !numvar<br /> %scrstr = "" ' holds score specification to extract<br /><br /> ' generate string to specify which scores to use in regression<br /> for !r2 = 1 to !r<br /> %scrstr = %scrstr + " s" + @str(!r2)<br /> next<br /><br /> eqjr.ls crime(!j) {%scrstr} ' estimate regression<br /><br /> SSRjr(!j, !r) = (eqjr.@ssr)/!obz ' take average of SSR<br /> next<br /> next<br /> ' get column means of SSRjr. namely, get r means, averaging across regressions j.<br /> vector SSRr = @cmean(SSRjr)<br /><br /> vector(!numvar) IC ' stores critical values<br /> for !r = 1 to !numvar<br /> IC(!r) = @log(SSRr(!r)) + !r*(!obz + !numvar)/(!obz*!numvar)*@log(!numvar)<br /> next<br /><br /> ' take the index of the minimum value of IC as number of principal components to retain<br /> scalar numpc = @imin(IC)<br /> </PRE> Unlike our graphical analysis, the protocol above suggests that the number of retained eigenvalues is 1. Nevertheless, for sake of greater analytical exposition below, we will stick with the original suggestion of retaining the first two principal directions instead.</br></br> <h4>Principal Direction Analysis</h4> The next step in our analysis is to look at what, if any, meaningful patterns emerge by studying the principal directions themselves. To do so, we again bring up the main principal component dialog and this time select <b>Variable Loading Plots</b> under the <b>Display</b> group. See Figure 9 below.</br></br> <!-- :::::::::: FIGURE 9 :::::::::: --> <center> <a href="https://1.bp.blogspot.com/-RkhgeidLzVU/W_wus3VgIcI/AAAAAAAAAns/NI3ayYMGOG0F5Kcx_-vQuyukO8iJDBWlACPcBGAYYCw/s1600/pcademo16.jpg"><img src="https://1.bp.blogspot.com/-RkhgeidLzVU/W_wus3VgIcI/AAAAAAAAAns/NI3ayYMGOG0F5Kcx_-vQuyukO8iJDBWlACPcBGAYYCw/s1600/pcademo16.jpg" title="PCA Dialog: Variable Loading Plots" width="320" height="auto" /></a><br /><br /> <small>Figure 9: PCA Dialog: Variable Loading Plots</small><br /><br /> </center> <!-- :::::::::: FIGURE 9 :::::::::: --> Variable loading plots produce $XY$ ''-pair plots of loading vectors. See section <i>Loading Plots</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series for further details. The user specifies which loading vectors to compare and selects one among the following loading (scaling) protocols: <ul> <li> <i>Normalize Loadings</i>: In this case, scaling is unity and loading vectors are in fact the eigenvectors themselves. <li> <i>Normalize Scores</i>: Here, the scaling factor is the square root of the eigenvalue vector. In other words, the $k^{\text{th}}$ element of the $i^{\text{th}}$ loading vector is the $k^{\text{th}}$ element of the $i^{\text{th}}$ eigenvector, multiplied by the square root of the $k^{\text{th}}$ eigenvalue. <li> <i>Symmetric Weights</i>: In this scenario, the scaling factor is the quartic (fourth) root of the eigenvalue vector. Namely, the $k^{\text{th}}$ element of the $i^{\text{th}}$ loading vector is the $k^{\text{th}}$ element of the $i^{\text{th}}$ eigenvector, multiplied by the fourth root of the $k^{\text{th}}$ eigenvalue. <li> <i>User Loading Weight</i>: If $0 \leq \omega \leq 1$ denotes the user defined scaling factor, then the loading vectors are formed by scaling the $k^{\text{th}}$ element of the corresponding eigenvector by the $k^{\text{th}}$ eigenvalue raised to the power $\omega/2$ . </ul> For the time being, stick with all default values. That is, we will look at the loading plots across the first two principal directions, and we will use the <b>Normalize Loadings</b> scaling protocol. In other words, we will plot the true eigenvectors since scaling is unity. Note that the choice of looking at only the first two principal directions is, among other things, motivated by our previous analysis on dimension reduction where we decided to retain only the first two principal eigenvalues and discard the rest. Go ahead and click on <b>OK</b>. Figure 10 summarizes the output.</br></br> <!-- :::::::::: FIGURE 10 :::::::::: --> <center> <a href="https://4.bp.blogspot.com/--7MFx125ijM/W_wutjX3pxI/AAAAAAAAAno/K0eIADkrkZgN7K9Hvxc_JRWg46zJMJ61ACPcBGAYYCw/s1600/pcademo17.jpg"><img src="https://4.bp.blogspot.com/--7MFx125ijM/W_wutjX3pxI/AAAAAAAAAno/K0eIADkrkZgN7K9Hvxc_JRWg46zJMJ61ACPcBGAYYCw/s1600/pcademo17.jpg" title="Variable Loading Plots Output" width="320" height="auto" /></a><br /><br /> <small>Figure 10: Variable Loading Plots Output</small><br /><br /> </center> <!-- :::::::::: FIGURE 10 :::::::::: --> As discussed in section <i>Loading Plots</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series, the angle between the vectors in a loading plot is related to the correlation between the original variables to which the loading vectors are associated. Accordingly, we see that <b>MURDER</b> and <b>ASSAULT</b> are moderately positively correlated, as are <b>ASSAULT</b> and <b>RAPE</b>, although the latter two less so than the former two. Moreover, it is clear that <b>RAPE</b> and <b>URBANPOP</b> are positively correlated, whereas <b>MURDER</b> and <b>URBANPOP</b> are nearly uncorrelated since they form a near 90 degree angle. In other words, we have a two-dimensional graphical representation of the four-dimensional correlation matrix in Figure 4b. This ability to represent higher dimensional information in a lower dimensional space is arguably the most useful feature of PCA.</br></br> Furthermore, all three variables, <b>MURDER</b>, <b>ASSAULT</b>, and <b>RAPE</b>, are strongly correlated with the first principal direction, whereas <b>URBANPOP</b> is strongly correlated with the second principal direction. In fact, looking at vector lengths, we can also see that <b>MURDER</b>, <b>ASSAULT</b>, and <b>RAPE</b> are roughly equally dominant in the first direction, whereas <b>URBANPOP</b> is significantly more dominant than either of the former three, albeit in the second direction. Of course, this simply confirms our preliminary analysis of the middle table in Figure 6.</br></br> Above, we started with the basic loading vector with scale unity. We could have, of course, resorted to other scaling options such as normalizing to the score vectors, using symmetric weights, or using some other custom weighting. Since each of these would yield a different but similar perspective, we won't delve further into details. Nevertheless, as an exercise in exhibiting the steps involved, we provide below small snippets of code to manually generate loading vectors using only the eigenvalues and eigenvectors associated with the underlying correlation matrix. This is done for each of the four scaling protocols. These manually generated vectors are then compared to the loading vectors generated by EViews' internal code and shown to be identical.</br></br> <PRE><br /> ' --- Verify Loading Plot Vectors ---<br /> group crime murder assault rape urbanpop ' create group with all 4 variables<br /><br /> ' make eigenvalues and eigenvectors based on the corr. matrix<br /> crime.pcomp(eigval=eval, eigvec=evec, cov=corr)<br /><br /> 'normalize loadings<br /> crime.makepcomp(loading=load, cov=corr) s1 s2 s3 s4 ' EViews generated loading vectors<br /> matrix evaldiag = @makediagonal(eval) ' create diagonal matrix of eigenvalues<br /> matrix loadverify = evec*evaldiag ' manually create loading vector with scaling unity<br /> matrix loaddiff = loadverify - load ' get difference between custom and eviews output<br /> show loaddiff ' display results<br /><br /> 'normalize scores<br /> crime.makepcomp(scale=normscores, loading=load, cov=corr) s1 s2 s3 s4<br /> loadverify = evec*@epow(evaldiag, 0.5)<br /> loaddiff = loadverify - load<br /> show loaddiff<br /><br /> 'symmetric weights<br /> crime.makepcomp(scale=symmetrics, loading=load, cov=corr) s1 s2 s3 s4<br /> loadverify = evec*@epow(evaldiag, 0.25)<br /> loaddiff = loadverify - load<br /> show loaddiff<br /><br /> 'user weights<br /> crime.makepcomp(scale=0.36, loading=load, cov=corr) s1 s2 s3 s4<br /> loadverify = evec*@epow(evaldiag, 0.18)<br /> loaddiff = loadverify - load<br /> show loaddiff<br /> </PRE> <h4>Score Analysis</h4> Whereas loading vectors reveal information on which variables dominate (and by how much) each principal direction, it is only when they are used to create the principal component vectors (score vectors) that they are truly useful in a data exploratory sense. In this regard, we again open the main principal component dialog and select <b>Component scores plots</b> in the <b>Display</b> group of options. We capture this in Figure 11 below.</br></br> <!-- :::::::::: FIGURE 11 :::::::::: --> <center> <a href="https://2.bp.blogspot.com/-x39Nb4R9EFE/W_wutkF2RPI/AAAAAAAAAns/1ZD_HLUKwbQT6hsJOGvThjzeOdTsPhhuACPcBGAYYCw/s1600/pcademo18.jpg"><img src="https://2.bp.blogspot.com/-x39Nb4R9EFE/W_wutkF2RPI/AAAAAAAAAns/1ZD_HLUKwbQT6hsJOGvThjzeOdTsPhhuACPcBGAYYCw/s1600/pcademo18.jpg" title="PCA Dialog: Component Scores Plots" width="320" height="auto" /></a><br /><br /> <small>Figure 11: PCA Dialog: Component Scores Plots</small><br /><br /> </center> <!-- :::::::::: FIGURE 11 :::::::::: --> Analogous to the loading vector plots, here, EViews produces $XY$ ''-pair plots of score vectors. As in the case of loading plots, the user specifies which score vectors to compare, and selects one among the following loading (scaling) protocols: <ul> <li> <i>Normalize Loadings</i>: Score vectors are scaled by unity. In other words, no scaling occurs. <li> <i>Normalize Scores</i>: The $k^{\text{th}}$ score vector is scaled by the inverse of the square root of the $k^{\text{th}}$ eigenvalue. <li> <i>Symmetric Weights</i>: The $k^{\text{th}}$ score vector is scaled by the inverse of the quartic root of the $k^{\text{th}}$ eigenvalue. <li> <i>User Loading Weight</i>: If $0 \leq \omega \leq 1$ denotes the user defined scaling factor, the $k^{\text{th}}$ score vector is scaled by the $k^{\text{th}}$ eigenvalue raised to the power $-\omega/2$. </ul> Furthermore, if outlier detection is desired, EViews allows users to specify a p-value as a detection threshold. See sections <i>Score Plots</i> and <i>Outlier Detection</i> in <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series for further details. Since we are currently interested in interpretive exercises, we will forgo outlier detection and choose to display all observations. To do so, under the <b>Graph options</b> group of options, change the <b>Obs. Labels</b> to <b>Label all obs.</b> and hit <b>OK</b>. We replicate the output in Figure 12.</br></br> <!-- :::::::::: FIGURE 12 :::::::::: --> <center> <a href="https://4.bp.blogspot.com/--kE3u3wBndY/W_wut8kI3yI/AAAAAAAAAnw/5KcYOP_fyNAdPDYSDat65ynDyNSGQn8gQCPcBGAYYCw/s1600/pcademo19.jpg"><img src="https://4.bp.blogspot.com/--kE3u3wBndY/W_wut8kI3yI/AAAAAAAAAnw/5KcYOP_fyNAdPDYSDat65ynDyNSGQn8gQCPcBGAYYCw/s1600/pcademo19.jpg" title="Component Scores Plots Output" width="320" height="auto" /></a><br /><br /> <small>Figure 12: Component Scores Plots Output</small><br /><br /> </center> <!-- :::::::::: FIGURE 12 :::::::::: --> The output produced is a scatter plot of principal component 1 (score vector 1) vs. principal component 2 (score vector 2). There are several important observations to be made here.</br></br> First, the further east of the zero vertical axis a state is located, the more positively correlated it is with the first principal direction. Since the latter is dominated positively (east of the zero vertical axis) by the three crime categories <b>MURDER</b>, <b>ASSAULT</b>, and <b>RAPE</b> (see Figure 11), we conclude that such states are positively correlated with said crimes. Naturally, converse conclusions hold as well. In particular, we see that <b>CALIFORNIA</b>, <b>NEVADA</b>, and <b>FLORIDA</b> are most positively correlated with the three crimes under consideration. If this is indeed the case, then it is little surprise that most Hollywood productions typically involve crime thrillers set in these three states. Conversely, <b>NORTH DAKOTA</b> and <b>VERMONT</b> are typically least associated with the crimes under consideration.</br></br> Second, the further north of the zero horizontal axis a state is located, the more positively correlated it is with the second principal direction. Since the latter is dominated positively (north of the zero horizontal axis) by the variable <b>URBNAPOP</b> (see Figure 11), we conclude that such states are positively correlated with urbanization. Again, the converse conclusions hold as well. In particular, <b>HAWAII</b>, <b>CALIFORNIA</b>, <b>RHODE ISLAND</b>, <b>MASSACHUSETTS</b>, <b>UTAH</b>, <b>NEW JERSEY</b> are states most positively associated with urbanization, whereas those least so are <b>SOUTH CAROLINA</b>, <b>NORTH CAROLINA</b>, and <b>MISSISSIPPI</b>.</br></br> Lastly, it is worth recalling that like loading vectors, score vectors can also be scaled. In this regard, we provide code snippets below to show how to manually compute scaled score vectors, exposing the algorithm that EViews uses to do same in its internal computations.</br></br> <PRE><br /> ' --- Verify Score Vectors ---<br /> ' make eigenvalues and eigenvectors based on the corr. matrix<br /> crime.pcomp(eigval=eval, eigvec=evec, cov=corr)<br /><br /> matrix evaldiag = @makediagonal(eval) ' create diagonal matrix of eigenvalues<br /><br /> stom(crime, crimemat) ' create matrix from crime group<br /> vector means = @cmean(crimemat) ' get column means<br /> vector popsds = @cstdevp(crimemat) ' get population standard deviations<br /><br /> ' initialize matrix for normalized crimemat<br /> matrix(@rows(crimemat), @columns(crimemat)) crimematnorm<br /><br /> ' normalize (remove mean and divide by pop. s.d.) every column of crimemat<br /> for !k = 1 to @columns(crimemat)<br /> colplace(crimematnorm,(@columnextract(crimemat,!k) - means(!k))/popsds(!k),!k)<br /> next<br /><br /> 'normalize loadings<br /> crime.makepcomp(cov=corr) s1 s2 s3 s4 ' get score series<br /> group scores s1 s2 s3 s4 ' put scores into group<br /> stom(scores, scoremat) ' put scores group into matrix<br /> matrix scoreverify = crimematnorm*evec ' create custom score matrix<br /> matrix scorediff = scoreverify - scoremat ' get difference between custom and eviews output<br /> show scorediff<br /><br /> 'normalize scores<br /> crime.makepcomp(scale=normscores, cov=corr) s1 s2 s3 s4<br /> group scores s1 s2 s3 s4<br /> stom(scores, scoremat)<br /> scoreverify = crimematnorm*evec*@inverse(@epow(evaldiag, 0.5))<br /> scorediff = scoreverify - scoremat<br /> show scorediff<br /><br /> 'symmetric weights<br /> crime.makepcomp(scale=symmetrics, cov=corr) s1 s2 s3 s4<br /> group scores s1 s2 s3 s4<br /> stom(scores, scoremat)<br /> scoreverify = crimematnorm*evec*@inverse(@epow(evaldiag, 0.25))<br /> scorediff = scoreverify - scoremat<br /> show scorediff<br /><br /> 'user weights<br /> crime.makepcomp(scale=0.36, cov=corr) s1 s2 s3 s4<br /> group scores s1 s2 s3 s4<br /> stom(scores, scoremat)<br /> scoreverify = crimematnorm*evec*@inverse(@epow(evaldiag, 0.18))<br /> scorediff = scoreverify - scoremat<br /> show scorediff<br /> </PRE> Above, observe that we derived eigenvalues and eigenvectors of the correlation matrix. Accordingly, to derive the score vectors manually, we needed to standardize the original variables first. In this regard, when using the covariance matrix instead, one need only to demean the original variables and disregard scaling information. We leave this as an exercise to interested readers.</br></br> <h4>Biplot Analysis</h4> As a last exercise, we superimpose the loading vectors and score vectors onto a single graph called the biplot. To do this, again, bring up the main principal component dialog and under the <b>Display</b> group select <b>Biplot (scores & loadings)</b>. As in the previous exercise, under the <b>Graph options</b> group, select <b>Label all obs.</b> from the <b>Obs. labels</b> dropdwon, and hit <b>OK</b>. We summarize these steps in Figure 13.</br></br> <!-- :::::::::: FIGURE 13 :::::::::: --> <center> <a href="https://3.bp.blogspot.com/-5ZJZITYjMFs/W_wuu3eXzaI/AAAAAAAAAn8/KIws_X7kqy80YUyLa531b2qBMEWTTURDgCPcBGAYYCw/s1600/pcademo20.jpg"><img src="https://3.bp.blogspot.com/-5ZJZITYjMFs/W_wuu3eXzaI/AAAAAAAAAn8/KIws_X7kqy80YUyLa531b2qBMEWTTURDgCPcBGAYYCw/s1600/pcademo20.jpg" title="PCA Dialog: Biplots (scores & loadings)" width="320" height="auto" /></a><br /><br /> <small>Figure 13: PCA Dialog: Biplots (scores & loadings)</small><br /><br /> </center> <!-- :::::::::: FIGURE 13 :::::::::: --> From an inferential standpoint, there's little to contribute beyond what we laid out in each of the previous two sections. Nevertheless, having both the loading and score vectors appear on the same graph visually reinforces our previous analysis. Accordingly, we close this section with just the graphical output.</br></br> <!-- :::::::::: FIGURE 14 :::::::::: --> <center> <a href="https://1.bp.blogspot.com/-fAFKcwUp_GQ/W_wuvN2sTHI/AAAAAAAAAnw/gyAc9evYMk0pu4pdYEkUKvPJZup0CM9fACPcBGAYYCw/s1600/pcademo21.jpg"><img src="https://1.bp.blogspot.com/-fAFKcwUp_GQ/W_wuvN2sTHI/AAAAAAAAAnw/gyAc9evYMk0pu4pdYEkUKvPJZup0CM9fACPcBGAYYCw/s1600/pcademo21.jpg" title="Biplots (scores & loadings) Output" width="320" height="auto" /></a><br /><br /> <small>Figure 14: Biplots (scores & loadings) Output</small><br /><br /> </center> <!-- :::::::::: FIGURE 14 :::::::::: --> <h3>Concluding Remarks</h3> In <a href="http://blog.eviews.com/2018/10/principal-component-analysis-part-i.html">Part I</a> of this series we laid out the theoretical foundations underlying PCA. Here, we used EViews to conduct a brief data exploratory implementation of PCA on serious crimes across 50 US states. Our aim was to illustrate the use of numerous PCA tools available in EViews with brief interpretations associated with each.</br></br> In closing, we would like to point out that apart from the main principal component dialog we used above, EViews also offers a <b>Make Principal Components...</b> proc function which provides a unified framework for producing vectors and matrices of the most important objects related to PCA. These include the vector of eigenvalues, the matrix of eigenvectors, the matrix of loading vectors, as well as the matrix of scores. To access this function, open the crime group from the workfile, click on <b>Proc</b> and click on <b>Make Principal Components...</b>. We summarize this in Figures 15a and 15b below.</br></br> <table> <tbody> <tr> <td> <!-- :::::::::: FIGURE 15A :::::::::: --> <center> <a href="https://4.bp.blogspot.com/-qJTuFEKJwqc/W_wuvBI9YRI/AAAAAAAAAn4/x7sxipltfmoNhmHp_QHDkwMuisTLavLLACPcBGAYYCw/s1600/pcademo22.jpg"><img src="https://4.bp.blogspot.com/-qJTuFEKJwqc/W_wuvBI9YRI/AAAAAAAAAn4/x7sxipltfmoNhmHp_QHDkwMuisTLavLLACPcBGAYYCw/s1600/pcademo22.jpg" title="Group Proc: Make Principal Components..." width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 15A :::::::::: --> </td> <td> <!-- :::::::::: FIGURE 15B :::::::::: --> <center> <a href="https://2.bp.blogspot.com/-soMYratGHJQ/W_wuvP3PpUI/AAAAAAAAAn0/mUEZAjHolnkVuPg9JRHQDZQkwg0WArtigCPcBGAYYCw/s1600/pcademo23.jpg"><img src="https://2.bp.blogspot.com/-soMYratGHJQ/W_wuvP3PpUI/AAAAAAAAAn0/mUEZAjHolnkVuPg9JRHQDZQkwg0WArtigCPcBGAYYCw/s1600/pcademo23.jpg" title="Make Principal Components Dialog" width="320" height="auto" /></a><br /><br /> </center> <!-- :::::::::: FIGURE 15B :::::::::: --> </td> </tr> <tr> <td><center><small>Figure 15a: Group Proc: Make Principal Components...</small><br /><br /></center></td> <td><center><small>Figure 15b: Make Principal Components Dialog</small><br /><br /></center></td> </tr> </tbody> </table> From here, one can insert names for all objects one wishes to place in the workfile, select the scaling one wishes to use in the creation of the loading and score vectors, and hit <b>OK</b>.</br></br> <hr> <h3>Files</h3> The EViews workfile can be downloaded here: <a href="http://www.eviews.com/blog/PCA/usarrests.wf1">usarrests.wf1</a></br> The EViews program file can be downloaded here: <a href="http://www.eviews.com/blog/PCA/usarrests.prg">usarrests.prg</a></br></br> <hr> <h3>References</h3> <table> <tr valign="top"> <td align="right" class="bibtexnumber"> [<a name="bai-2002">1</a>] </td> <td class="bibtexitem"> Jushan Bai and Serena Ng. Determining the number of factors in approximate factor models. <em>Econometrica</em>, 70(1):191--221, 2002. </td> </tr> </table> </span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-50967236099759991142018-10-15T14:04:00.000-07:002018-11-26T15:03:04.234-08:00Principal Component Analysis: Part I (Theory)<script type="text/x-mathjax-config">MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\$","\$"] ], displayMath: [ ['$$','$$'], ["\$","\$"] ], }, TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js"], Macros: { lb: "{\\left(}", rb: "{\\right)}", bu: ['{\\underline{#1}}', 1], ba: ['{\\overline{#1}}', 1], norm: ['{\\lVert#1\\rVert}', 1] } } }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; , sans-serif;"> Most students of econometrics are taught to appreciate the value of data. We are generally taught that more data is better than less, and that throwing data away is almost "taboo". While this is generally good practice when it concerns the number of observations per variable, it is not always recommended when it concerns the number of variables under consideration. In fact, as the number of variables increases, it becomes increasingly more difficult to rank the importance (impact) of any given variable, and can lead to problems ranging from basic overfitting, to more serious issues such as multicollinearity or model invalidity. In this regard, selecting the smallest number of the most <i>meaningful</i> variables -- otherwise known as <i>dimensionality reduction</i> -- is not a trivial problem, and has become a staple of modern data analytics, and a motivation for many modern techniques. One such technique is <b>Principal Component Analysis</b> (PCA).<a name='more'></a></br></br> <h3>Variance Decomposition</h3> Consider a linear statistical system -- a random matrix (multidimensional set of random variables) $\mathbf{X}$ of size $n \times m$ where the first dimension denotes observations and the second variables. Moreover, recall that linear statistical systems are characterized by two inefficiencies: 1) noise and 2) redundancy. The former is commonly measured through the <i>signal (desirable information) to noise (undesirable information) ratio</i> $\text{SNR} = \sigma^{2}_{\text{signal}} / \sigma^{2}_{\text{noise}}$, and implies that systems with larger signal variances $\sigma^{2}_{\text{signal}}$ relative to their noise counterpart, are more informative. Assuming that noise is a nuisance equally present in observing each of the $m$ variables of our system, it stands to reason that variables with larger variances have larger SNRs, therefore carry relatively richer signals, and are in this regard relatively more important, or <i>principal</i>.</br></br> Whereas relative importance reduces to relative variances across system variables, redundancy, or relative <i>uniqueness</i> of information, is captured by system covariances. Recall that covariances (or normalized covariances called correlations) are measures of variable dependency or co-movement (direction and magnitude of joint variability). In other words, variables with overlapping (redundant) information will typically move in the same direction with similar magnitudes, and will therefore have non-zero covariances. Conversely, when variables share little to no overlapping information, they exhibit small to zero linear dependency, although statistical dependence could still manifest nonlinearly.</br></br> Together, system variances and covariances quantify the amount of information afforded by each variable, and how much of that information is truly unique. In fact, the two are typically derived together using the familiar <i>variance-covariance</i> matrix formula: $$\mathbf{\Sigma}_{X} = E \left( \mathbf{X}^{\top}\mathbf{X} \right)$$ where $\mathbf{\Sigma}_{X}$ is an $m\times m$ <i>square symmetric</i> matrix with (off-)diagonal elements as (co)variances, and where we have <i>a priori</i> assumed that all variables in $\mathbf{X}$ have been demeaned. Thus, systems where all variables are unique will result in a diagonal $\mathbf{\Sigma}_{X}$, whereas those exhibiting redundancy will have non-zero off-diagonal elements. In this regard, systems with zero redundancy have a particularly convenient feature known as <i>variance decomposition</i>. Since covariance terms in these systems are zero, total system variation (and therefore information) is the sum of all variance terms, and the proportion of total system information contributed by a variable is the ratio of its variance to total system variation.</br></br> Although the variance-covariance matrix is typically not diagonal, suppose there exists a way to diagonalize $\mathbf{\Sigma}_{X}$, and by extension transform $\mathbf{X}$, while simultaneously preserving information. If such transformation exists, one is guaranteed a new set of at most $m$ variables (some variables may be perfectly correlated with others) which are uncorrelated, and therefore linearly independent. Accordingly, discarding any one of those new variables would have no linear statistical impact on the $m-1$ remaining variables, and would reduce dimensionality at the cost of losing information to the extent contained in the discarded variables. In this regard, if one could also quantify the amount of information captured by each of the new variables, order the latter in descending order of information quantity, one could discard variables from the back until sufficient dimensionality reduction is achieved, while maintaining the maximum amount of information within the preserved variables. We summarize these objectives below: <ol> <li> Diagonalize $\mathbf{\Sigma}_{X}$. <li> Preserve information. <li> Identify principal (important) information. <li> Reduce dimensionality. </ol> So how does one realize these objectives? It is precisely this question which motivates the subject of this entry.</br></br> <h3>Principal Component Analysis</h3> Recall that associated with every matrix $\mathbf{X}$ is a <i>basis</i> -- a set (matrix) of <i>linearly independent</i> vectors such that <i>every</i> row vector in $\mathbf{X}$ is a linear combination of the vectors in the basis. In other words, the row vectors are <i>projections</i> onto the column vectors in $\mathbf{B}$. Since the covariance matrix contains all noise and redundancy information associated with a matrix, the idea driving <i>principal component analysis</i> is to re-express the original covariance matrix using a basis that results in a new, diagonal covariance matrix -- in other words, off-diagonal elements in the original covariance matrix are driven to zero and redundancy is eliminated.</br></br> <h4>Change of Basis</h4> The starting point of PCA is the <i>change of basis</i> relationship. In particular, if $\mathbf{B}$ is an $m\times p$ matrix of geometric transformations with $p \leq m$, the $n\times p$ matrix $\mathbf{Q}=\mathbf{XB}$ is a projection of the $n\times m$ matrix $\mathbf{X} = [\mathbf{X}_{1}^{\top}, \ldots, \mathbf{X}_{n}^{\top}]^{\top}$ onto $\mathbf{B}$. In other words, the rows of $\mathbf{X}$ are linear combinations of the column vectors in $\mathbf{B} = [\mathbf{B}_{1}, \ldots, \mathbf{B}_{p}]$. Formally, \begin{align*} \mathbf{Q} & = \begin{bmatrix} \mathbf{X}_{1}\\ \vdots\\ \mathbf{X}_{n} \end{bmatrix} \begin{bmatrix} \mathbf{B}_{1} &\cdots &\mathbf{B}_{p} \end{bmatrix}\\ &= \begin{bmatrix} \mathbf{X}_{1}\mathbf{B}_{1} &\cdots &\mathbf{X}_{1}\mathbf{B}_{p}\\ \vdots &\ddots &\vdots\\ \mathbf{X}_{n}\mathbf{B}_{1} &\cdots &\mathbf{X}_{n}\mathbf{B}_{p} \end{bmatrix} \end{align*} More importantly, if the column vectors $\left\{ \mathbf{B}_{1}, \ldots, \mathbf{B}_{p} \right\}$ are also linearly independent, then $\mathbf{B}$, by definition, characterizes a matrix of basis vectors for $\mathbf{X}$. Furthermore, the covariance matrix of this transformation formalizes as: \begin{align} \mathbf{\Sigma}_{Q} = E\left( \mathbf{Q}^{\top}\mathbf{Q} \right) = E\left( \mathbf{B}^{\top}\mathbf{X}^{\top}\mathbf{XB} \right) = \mathbf{B}^{\top}\mathbf{\Sigma}_{X}\mathbf{B} \label{eq1} \end{align} It is important to reflect here on the dimensionality of $\mathbf{\Sigma}_{Q}$, which, unlike $\mathbf{\Sigma}_{X}$, is of dimension $p\times p$ where $p \leq m$. In other words, the covariance matrix under the transformation $\mathbf{B}$ is at most the size of the original covariance matrix, and possibly smaller. Since dimensionality reduction is clearly one of our objectives, the transformation above is certainly poised to do so. However, the careful reader may remark here: <i>if the objective is simply dimensionality reduction, then any matrix $\mathbf{B}$ of size $m \times p$ with $p\leq m$ will suffice; so why especially does $\mathbf{B}$ have to characterize a basis?</i></br></br> The answer is simple: dimensionality reduction is not the <i>only</i> objective, but one among <i>preservation of information</i> and <i>importance of information</i>. As to the former, we recall that what makes a set of basis vectors special is that they characterize <i>entirely</i> the space on which an associated matrix takes values and therefore <i>span</i> the multidimensional space on which that matrix resides. Accordingly, if $\mathbf{B}$ characterizes a basis, then information contained in $\mathbf{X}$ is never lost during the transformation to $\mathbf{Q}$. Furthermore, recall that the channel for dimensionality reduction that motivated our discussion earlier was never intended to go through a sparser basis. Rather, the mechanism of interest was a diagonalization of the covariance matrix followed by variable exclusion. Accordingly, any dimension reduction that reflects basis sparsity via $p \leq m$, is a consequence of perfect co-linearity (correlation) among some of the original system variables. In other words, $p = \text{rk}\left( \mathbf{X} \right)$, where $\text{rk}(\cdot)$ denotes the matrix <i>rank</i>, or the number of its linearly independent columns (or rows).</br></br> <h4>Diagonalization</h4> We argued earlier that any transformation from $\mathbf{X}$ to $\mathbf{Q}$ that preserves information must operate through a basis transformation $\mathbf{B}$. Suppose momentarily that we have in fact found such $\mathbf{B}$. Our next objective would be to ensure that $\mathbf{B}$ also produces a diagonal $\mathbf{\Sigma}_{Q}$. In this regard, we remind the reader of two famous results in linear algebra: <ol> <li>[Thm. 1:] <i>A matrix is symmetric if and only if it is orthogonally diagonalizable.</i> <ul> <li> In other words, if a matrix $\mathbf{A}$ is symmetric, there exists a diagonal matrix $\mathbf{D}$ and a matrix $\mathbf{E}$ which <i>diagonalizes</i> $\mathbf{A}$, such that $\mathbf{A} = \mathbf{EDE}^{\top}$. The converse statement holds as well. </ul> <li>[Thm. 2:] <i>A symmetric matrix is diagonalized by a matrix of its orthonormal eigenvectors.</i> <ul> <li> Extending the result above, if a $q\times q$ matrix $\mathbf{A}$ is symmetric, the diagonalizing matrix $\mathbf{E} = [\mathbf{E}_{1}, \ldots, \mathbf{E}_{q}]$, the diagonal matrix $\mathbf{D} = \text{diag} [\lambda_{1}, \ldots, \lambda_{q}]$, and $\mathbf{E}_{i}$ and $\lambda_{i}$ are respectively the $i^{\text{th}}$ <i>eigenvector</i> and associated <i>eigenvalue</i> of $\mathbf{A}$. <li> Note that a set of vectors is <i>orthonormal</i> if each vector is of length unity and orthogonal to all other vectors in the set. Accordingly, if $\mathbf{V} = [\mathbf{V}_{1}, \ldots, \mathbf{V}_{q}]$ is orthonormal, then $\mathbf{V}_{j}^{\top}\mathbf{V}_{j} = 1$ and $\mathbf{V}_{j}^{\top}\mathbf{V}_{k} = 0$ for all $j \neq k$. Furthermore, $\mathbf{V}^{\top}\mathbf{V} = \mathbf{I}_{q}$ where $\mathbf{I}_{q}$ is the identity matrix of size $q$, and therefore, $\mathbf{V}^{\top} = \mathbf{V}^{-1}$. <li> Recall further that eigenvectors of a linear transformation are those vectors which only change magnitude but not direction when subject to said transformation. Since any matrix is effectively a linear transformation, if $\mathbf{v}$ is an eigenvector of some matrix $\mathbf{A}$, it satisfies the relationship $\mathbf{Av} = \lambda \mathbf{v}$. Here, associated with each eigenvector is the eigenvalue $\lambda$ quantifying the resulting change in magnitude. <li> Finally, observe that matrix rank determines the maximum number of eigenvectors (eigenvalues) one can extract for said matrix. In particular, if $\text{rk}(\mathbf{A}) = r \leq q$, there are in fact only $r$ orthonormal eigenvectors associated with $\mathbf{A}$. To see this, use a geometric interpretation to note that $q-$dimensional objects reside in spaces with $q$ orthogonal directions. Since any $n\times q$ matrix is effectively a $q-$dimensional object of vectors, the maximum number of orthogonal directions that characterize these vectors is $q$. Nevertheless, if the (column) rank of this matrix is in fact $r \leq q$, then $q - r$ of the $q$ orthogonal directions are never used. For instance, think of 2$d$ drawings in 3$d$ spaces. It makes no difference whether the drawing is characterized in the $xy$, the $xz$, or the $yz$ plane -- the drawing still has 2 dimensions and in any of those configurations, the dimension left out is a linear combination of the others. In particular, if the $xz$ plane is used, then the $z-$direction is a linear combination of the $y-$direction since the drawing can be equivalently characterized in the $xy$ plane, and so on. In other words, one of the three dimensions is never used, although it exists and can be characterized if necessary. Along the same lines, if $\mathbf{A}$ indeed has rank $r \leq q$, we can construct $q - r$ additional orthogonal eigenvectors to ensure dimensional equality in the diagonalization $\mathbf{A} = \mathbf{EDE}^{\top}$, although their associated eigenvalues will in fact be 0, essentially negating their presence. <li> By extension of the previous point, since $\mathbf{A}$ is a $q-$dimensional object of $q-$dimensional column vectors, it can afford at most $q$ orthogonal directions to characterize its space. Since all $q$ such vectors are collected in $\mathbf{E}$, we are guaranteed that $\mathbf{E}$ is a spanning set and therefore constitutes an <i>eigenbasis</i>. </ul> </ol> Since $Cov(\mathbf{X})$ is a symmetric matrix by construction, the $1^{\text{st}}$ result above affords a re-express of equation (\ref{eq1}) as follows: \begin{align} \mathbf{\Sigma}_{Q} &= \mathbf{B}^{\top} \mathbf{\Sigma}_{X} \mathbf{B} \notag \\ &= \mathbf{B}^{\top}\mathbf{E}_{X}\mathbf{D}_{X}\mathbf{E}_{X}^{\top} \mathbf{B} \label{eq2} \end{align} where $\mathbf{E}_{X} = [\mathbf{E}_{1}, \ldots, \mathbf{E}_{m}]$ is the orthonormal matrix of eigenvectors of $\mathbf{\Sigma}_{X}$ and $\mathbf{D}_{X} = \text{diag} [\lambda_{1}, \ldots, \lambda_{q}]$ is the diagonal matrix of associated eigenvalues.</br></br> Now, since we require $\mathbf{\Sigma}_{Q}$ to be diagonal, we can set $\mathbf{B}^{\top} = \mathbf{E}^{-1}$ in order to reduce $Cov(\mathbf{Q})$ to the diagonal matrix $\mathbf{D}_{X}$. Since the $2^{\text{nd}}$ linear algebra result above guarantees that $\mathbf{E}_{X}$ is orthonormal, we know that $\mathbf{E}^{-1} = \mathbf{E}^{\top}$. Accordingly, \begin{align} \mathbf{\Sigma}_{Q} = \mathbf{D}_{X} \quad \text{if and only if} \quad \mathbf{B} = \mathbf{E}_{X} \label{eq3} \end{align} The entire idea is visualized below in Figures 1 and 2. In particular, Figure 1 demonstrates the data perspective'' view of the system in relation to an alternate basis. That is, two alternate basis axes, labeled as Principal Direction 1'' and Principal Direction 2'' are superimposed on the familiar $x$ and $y$ axes. Since the vectors of a basis are mutually orthogonal, the principal direction axes are naturally drawn at $90$&deg angles. Alternatively, Figure 2 demonstrates the view of the system when the perspective uses the principal directions as the reference axes. <table> <tbody> <tr> <td><!-- :::::::::: FIGURE 1 :::::::::: --><center><a href="https://4.bp.blogspot.com/-W4UMHPURwds/W8TwEvK80eI/AAAAAAAAAlM/9TDa99SG5a8vOPshslHobrzsPfAgb3yWQCLcBGAs/s1600/pcadta.jpg"><img src="https://4.bp.blogspot.com/-W4UMHPURwds/W8TwEvK80eI/AAAAAAAAAlM/9TDa99SG5a8vOPshslHobrzsPfAgb3yWQCLcBGAs/s1600/pcadta.jpg" title="" width="320" height="auto" /></a><br /><br /></center><!-- :::::::::: FIGURE 1 :::::::::: --> </td> <td><!-- :::::::::: FIGURE 2 :::::::::: --><center><a href="https://2.bp.blogspot.com/-JTwqThzlduY/W8TwElonl8I/AAAAAAAAAlI/mBD1-k36W0kayxm-egMxS1Ew3gQHYR9vQCLcBGAs/s1600/pcaeig.jpg"><img src="https://2.bp.blogspot.com/-JTwqThzlduY/W8TwElonl8I/AAAAAAAAAlI/mBD1-k36W0kayxm-egMxS1Ew3gQHYR9vQCLcBGAs/s1600/pcaeig.jpg" title="" width="320" height="auto" /></a><br /><br /></center><!-- :::::::::: FIGURE 2 :::::::::: --> </td> </tr> </tbody></table> <h4>Consistency</h4> In practice, $\mathbf{\Sigma}_{X}$, and by extension $\mathbf{\Sigma}_{Q}, \mathbf{E}_{X},$ and $\mathbf{D}_{X}$, are typically not observed. Nevertheless, we can apply the analysis above using sample covariance matrices $$\mathbf{S}_{Q} = \frac{1}{n}\mathbf{Q}^{\top}\mathbf{Q} \xrightarrow[n \to \infty]{p} \mathbf{\Sigma}_{Q} \quad \text{and} \quad \mathbf{S}_{X} = \frac{1}{n}\mathbf{X}^{\top}\mathbf{X} \xrightarrow[n \to \infty]{p} \mathbf{\Sigma}_{X}$$ where $\xrightarrow[\color{white}{n \to \infty}]{p}$ indicates weak convergence to asymptotic counterparts. In this regard, the result analogous to equation (\ref{eq2}) for estimated $2^{\text{nd}}$ moment matrices states that \begin{align} \mathbf{S}_{Q} = \widehat{\mathbf{E}}_{X}^{\top} \mathbf{S}_{X} \widehat{\mathbf{E}}_{X} = \widehat{\mathbf{E}}_{X}^{\top} \left( \widehat{\mathbf{E}}_{X}\widehat{\mathbf{D}}_{X}\widehat{\mathbf{E}}_{X}^{\top} \right) \widehat{\mathbf{E}}_{X} = \widehat{\mathbf{D}}_{X} \label{eq4} \end{align} where $\widehat{\mathbf{E}}_{X}$ and $\widehat{\mathbf{D}}_{X}$ now represent the eigenbasis and respective eigenvalues associated with the square symmetric matrix $\mathbf{S}_{X}$. It is important to understand here that while $\widehat{\mathbf{E}}_{X} \neq \mathbf{E}_{X}$ and $\widehat{\mathbf{D}}_{X} \neq \mathbf{D}_{X}$, there is a long-standing literature far beyond the scope of this entry which guarantees that $\widehat{\mathbf{E}}_{X}$ and $\widehat{\mathbf{D}}_{X}$ are both consistent estimators of $\mathbf{E}_{X}$ and $\mathbf{D}_{X}$, provided $m/n \to 0$ as $n \to \infty$. In other words, as in classical regression paradigms, consistency of PCA holds only under the usual large $n$ and small $m$ '' framework. There are modern results which address cases for $m/n \to c > 0$, however, they too are beyond the scope of this text. In proceeding however, in order to contain notational complexity, unless otherwise stated, we will maintain that $\mathbf{E}_{X}$ and $\mathbf{D}_{X}$ now represent the eigenbasis and respective eigenvalues associated with the square symmetric matrix $\mathbf{S}_{X}$.</br></br> <h4>Preservation of Information</h4> In addition to diagonalizing $\mathbf{S}_{Q}$, we also require preservation of information. For this we need to guarantee that $\mathbf{B}$ is a basis. Here, we recall the final remark under the $2^{\text{nd}}$ linear algebra result above, which argues that $\mathbf{S_{Q}}$ affords at most $m$ orthonormal eigenvectors and associated eigenvalues, with the former also forming an eigenbasis. Since all $m$ eigenvectors are collected in $\mathbf{E}_{X} = \mathbf{B}$, we are guaranteed that $\mathbf{B}$ is indeed a basis. In this regard, we transform $\mathbf{X}$ into $m$ statistically uncorrelated, but exhaustive <i>directions</i>. We are careful not to use the word <i>variables</i> (although technically they are), since the transformation $\mathbf{Q} = \mathbf{XE}_{X}$ does not preserve variable interpretation. That is, the $j^{\text{th}}$ column of $\mathbf{Q}$ no longer retains the interpretation of the $j^{\text{th}}$ variable (column) in $\mathbf{X}$. In fact, the $j^{\text{th}}$ column of $\mathbf{Q}$ is a projection (linear combination) of <i>all</i> $m$ variables in $\mathbf{X}$, in the direction of the $j^{\text{th}}$ eigenvector $\mathbf{E}_{j}$. Accordingly, we can interpret $\mathbf{XE}_{X}$ as $m$ orthogonal weighted averages of the $m$ variables in $\mathbf{X}$. Furthermore, since $\mathbf{E}_{X}$ is an eigenbasis, the total variation (information) of the original system $\mathbf{X}$, namely $\mathbf{S}_{X}$, is preserved in the transformation to $\mathbf{Q}$. Unlike $\mathbf{S}_{X}$ however, $\mathbf{S}_{Q} = \mathbf{D}_{X}$ is diagonal, and total variation in $\mathbf{X}$ is now distributed across $\mathbf{Q}$ without redundancy.</br></br> <h4>Principal Directions</h4> Since preservation of information is guaranteed under the transformation $\mathbf{Q} = \mathbf{XE}_{X}$, the proportion of information in $\mathbf{S}_{X}$ associated with the $j^{\text{th}}$ column of $\mathbf{S}_{Q}$ is in fact $\lambda_{j}$. By extension, each column in $\mathbf{Q}$ has standard deviation $\sqrt{\lambda_{j}}$ or variance $\lambda_{j}$. Moreover, since $\mathbf{S}_{Q}$ is diagonal and information redundancy is not an issue, it stands to reason that the total amount of system variation is the sum of variations due to each column in $\mathbf{Q}$. In other words, total system variation is $\text{tr}\left( \mathbf{S}_{Q} \right) = \lambda_{1} + \ldots + \lambda_{m}$, where $\text{tr}(\cdot)$ denotes the matrix trace operator, and the $j^{\text{th}}$ orthogonalized direction contributes to $$\frac{\lambda_{j}}{\lambda_{1} + \ldots + \lambda_{m}} \times 100 \%$$ of total system variation (information). If we now arrange the columns of $\mathbf{Q}$, or equivalently those of $\mathbf{E}_{X}$, according to the order $\lambda_{(1)} \geq \lambda_{(2)} \geq \ldots \geq \lambda_{(m)}$, where $\lambda_{(j)}$ are ordered versions of their counterparts $\lambda_{j}$, we are guaranteed to have the directions arranged from most principal to least, measured as the proportion of total system variation contributed by that direction.</br></br> Another useful feature of the vectors in $\mathbf{E}_{X}$ is that they quantify the proportion of directionality each original variable contributes toward the overall direction of that vector. In particular, let $e_{i,j}$ denote the $i^{\text{th}}$ element in $\mathbf{E}_{j} = [e_{1,j}, \ldots, e_{m,j} ]$, where $i \in {1, \ldots, m}$, and observe that since $\mathbf{E}_{j}$ are the eigenvectors of $\mathbf{S}_{X}$, each element $e_{i,j}$ is in fact associated with the $i^{\text{th}}$ variable (column) of $\mathbf{X}$. Furthermore, since the vectors $\mathbf{E}_{j}$ each have unit length due to (ortho)normality, we know that they must lie inside the unit circle and that $e_{i,j}^{2} \times 100 \%$ of the direction $\mathbf{E}_{j}$ is due to variable $i$. In other words, we can quantify how principal each variable is in each direction.</br></br> <h4>Principal Components</h4> Principal directions, the eigenvectors in $\mathbf{E}_{X}$, are often mistakenly called principal components. Nevertheless, correct literature reserves the term <i>principal components</i> for the projections of the original system variables <i>onto</i> the principal directions. That is, principal components refer to the column vectors in $\mathbf{Q} = [\mathbf{Q}_{1}, \ldots, \mathbf{Q}_{m}] = \mathbf{XE}_{X}$, and are sometimes also referred to as <i>scores</i>. Like their principal direction counterparts, principal components contain several important properties worth observing.</br></br> As a direct consequence of the diagonalization properties discussed earlier, the variance of each principal component is in fact the eigenvalue associated with the underlying principal direction, and principal components are mutually uncorrelated. To see this formally, let $\mathbf{C}_{j} = [0, \ldots, 0, \underbrace{1}_j, 0, \ldots, 0 ]^{\top}$ denote the canonical basis vector in the $j^{\text{th}}$ dimension. Then, using the result in equation (\ref{eq4}), the correlation between the $j^{\text{th}}$ and $k^{\text{th}}$ principal components $\mathbf{Q}_{j} = \mathbf{QC}_{j}$ and $\mathbf{Q}_{k} = \mathbf{QC}_{k}$, respectively, is obviously: \begin{align*} s_{Q_{j}, Q_{k}} &= \frac{1}{n}\mathbf{Q}_{j}^{\top}\mathbf{Q}_{k} \\ &= \mathbf{C}_{j}^{\top} \left( \frac{1}{n} \mathbf{Q}^{\top}\mathbf{Q} \right) \mathbf{C}_{k} \\ &= \mathbf{C}_{j}^{\top} \mathbf{S}_{Q} \mathbf{C}_{k} \\ &= \mathbf{C}_{j}^{\top} \mathbf{D}_{X} \mathbf{C}_{k} \\ \end{align*} which equals $\lambda_{j}$ when $j = k$ and $0$ otherwise.</br></br> Moreover, we can quantify how (co)related the original variables are with the principal directions. In particular, consider the covariance between the $i^{\text{th}}$ variable $\mathbf{X}_{i}=\mathbf{XC}_{i}$ and the $j^{\text{th}}$ principal component $\mathbf{Q}_{j}$, formalized as: \begin{align} \mathbf{S}_{X_{i}Q_{j}} & = \frac{1}{n} \mathbf{X}_{i}^{\top}\mathbf{Q}_{j} \notag\\ &= \mathbf{C}_{i}^{\top} \left( \frac{1}{n}\mathbf{X}^{\top}\mathbf{Q} \right) \mathbf{C}_{j}\notag\\ &= \mathbf{C}_{i}^{\top} \left( \frac{1}{n}\mathbf{X}^{\top}\mathbf{X}\mathbf{E}_{X} \right) \mathbf{C}_{j}\notag\\ &= \mathbf{C}_{i}^{\top} \mathbf{S}_{X} \mathbf{E}_{X} \mathbf{C}_{j}\notag\\ &= \mathbf{C}_{i}^{\top} \mathbf{E}_{X}\mathbf{D}_{X} \mathbf{E}_{X}^{\top} \mathbf{E}_{X} \mathbf{C}_{j}\notag\\ &= \mathbf{C}_{i}^{\top} \mathbf{E}_{X}\mathbf{D}_{X} \mathbf{C}_{j}\notag\\ &= e_{i,j} \lambda_{j} \label{eq5} \end{align} where the antepenultimate line invokes Theorem 1 to $\mathbf{S}_{X}$, and the cancelation to identity in the penultimate line follows by Theorem 2 and orthonormality of $\mathbf{E}_{X}$, and the ultimate line is the product of the $j^{\text{th}}$ element of the principal direction $\mathbf{E}_{j}$ and the $j^{\text{th}}$ principal eigenvalue.</br></br> <h4>Dimension Reduction</h4> At last, we arrive a the issue of dimensionality reduction. Assuming that the columns of $\mathbf{Q}$ are arranged in decreasing order of importance (more principal columns come first), we can discard the $g < m$ least principal columns of $\mathbf{Q}$ until sufficient dimension reduction is achieved, and rest assured that the remaining (first) $m - g$ columns are in fact most principal. In other words, the $m - g$ directions which are retained, contribute to $$\frac{ \sum \limits_{j=1}^{m-g}\lambda_{(j)}}{\lambda_{1} + \ldots + \lambda_{m}} \times 100 \%$$ of the original variation in $\mathbf{X}$. Since directions are ordered in decreasing order of importance, the first few directions will capture the majority of variation, leaving the less principal directions to contribute information only marginally. Accordingly, one can significantly reduce dimensionality whilst retaining the majority of information. This is particularly important when we want to measure the complexity of our data set. In particular, if the $r$ most principal directions account for the majority of variance, it stands to reason that our underlying data set is in fact only $r-$dimensional, with the remaining $m-r$ dimensions being noise. In other words, dimensionality reduction naturally leads to data <i>denoising</i>.</br></br> So how does one select how many principal directions to retain? There are several approaches, but we list only several below: <ol> <li> A very popular approach is to use a <i>scree plot</i> -- a plot of the ordered eigenvalues from most to least principal. The idea here is to look for a sharp drop in the function, and select the <i>bend</i> or <i>elbow</i> as the cutoff value, retaining all eigenvalues (and by extension principal directions) to the left of this value. <li> Another popular alternative is to use the cumulative proportion of variation explained by the first $r$ principal directions. In other words, select the first $r$ principal directions such that $\frac{ \sum \limits_{j=1}^{r}\lambda_{(j)}}{\lambda_{1} + \ldots + \lambda_{m}} \geq 1 - \alpha$, where $\alpha \in [0,1]$. Typical uses set $\alpha = 0.1$ in order to retain $r$ most principal directions that capture at least 90\% of the system variation. <li> A more data driven result is known as the Guttman-Kaiser (Guttman (1954), Kaiser (1960), Kaiser (1961)) criterion. This criterion advocates the retention of all eigenvalues, and by extension, the associated principal directions, that exceed the average of all eigenvalues. In other words, select the first $r$ principal directions such that $\lambda_{(1)} + \ldots + \lambda_{(k)} \geq r\bar{\lambda}$, where $\bar{\lambda} = \frac{1}{m} \sum\limits_{j = 1}^{m}\lambda_{j}$. <li> An entirely data-driven approach akin to classical information criteria selection methods borrows the Bai and Ng (2002) paper on factor models. In this regard, consider $$\mathbf{X}_{j} = \beta_{1}\mathbf{Q}_{1} + \ldots + \beta_{r}\mathbf{Q}_{r} + \mathbf{U}(j,r)$$ as the regression of the $j^{\text{th}}$ variable in $\mathbf{X}$ on the first $r$ principal components of $\mathbf{S}_{X}$, and let $\widehat{\mathbf{U}}(j,r)$ denote the corresponding residual vector. Furthermore, define $SSR(j,r) = \frac{1}{n} \widehat{\mathbf{U}}(j,r)^{\top} \widehat{\mathbf{U}}(j,r)$ as the sum of squared residuals from said regression, and define $SSR(r) = \frac{1}{m}\sum \limits_{j=1}^{m}SSR(j,r)$ as the average of all $SSR(j,r)$ across all variables $j$ for a given $r$. We can then select $r$ as the one that minimizes a particular penalty function. In other words, the problem reduces to: $$\min\limits_{r} \left\{ \ln\left( SSR(r) \right) + rg(n,m) \right\}$$ where $g(n,m)$ is a penalty term which leads to one of several criteria proposed in Bai and Ng (2002). For instance when $n > m$, one such option is the $IC_{p2}(r)$ criterion, and the problem above formalizes as: $$\min\limits_{r} \left\{ \ln\left( SSR(r) \right) + r\left( \frac{n + m}{nm} \right) \ln(m) \right\}$$ </ol> Of course, it goes without saying that discarding information comes at its own cost, although, if dimensionality reduction is desired, it may well be a price worth paying.</br></br> <h3>Inference</h3> Although PCA is deeply rooted in linear algebra, it is also a very visual experience. In this regard, a particularly convenient feature is the ability to visualize multidimensional structures across two-dimensional summaries. In particular, comparing two principal directions provides a wealth of information that is typically inaccessible in traditional multidimensional contexts.</br></br> <h4>Loading Plots</h4> A powerful inferential tool unique to PCA is element-wise comparison of two principal directions. In particular, consider two principal directions $\mathbf{E}_{j} = [e_{1,j}, \ldots, e_{m,j}]$ and $\mathbf{E}_{k} = [e_{1,k}, \ldots, e_{m,k}]$, and let $\left\{ \mathbf{V}_{1,j,k}, \ldots, \mathbf{V}_{m,j,k} \right\}$ denote the set of vectors from the origin $(0,0)$ to $\left( e_{i,j}, e_{i,k} \right)$ for $i \in {1, \ldots, m}$. In other words, $\mathbf{V}_{i,j,k} = \left( e_{i,j}, e_{i,k} \right)^{\top}$. Then, for any $(j,k)$ principal direction pairs, a plot of all $m$ vectors $\mathbf{V}_{i,j,k}$, for $i \in {1, \ldots, m}$, on a single plot, is called a <i>loading plot</i>.</br></br> There is an important connection between the vectors $\mathbf{V}_{i,j,k}$ and original variable covariances. In particular, consider $\mathbf{S}_{X_{i},X_{s}}$ -- the finite sample covariance between $\mathbf{X}_{i}$ and $\mathbf{X}_{s}$ -- and, assuming we have ordered eigenvalues from most principal to least, note that: \begin{align*} \mathbf{S}_{X_{i},X_{s}} &= \mathbf{C}_{i}^{\top} \mathbf{S}_{X} \mathbf{C}_{s}\\ &= \mathbf{C}_{i}^{\top} \mathbf{E}_{X} \mathbf{D}_{X} \mathbf{E}_{X}^{\top} \mathbf{C}_{s}\\ &= \lambda_{(1)}e_{i,1}e_{s,1} + \lambda_{(2)}e_{i,2}e_{s,2} + \ldots + \lambda_{(m)}e_{i,m}e_{s,m}\\ &= \mathbf{V}_{i,1,2}^{\top}\mathbf{L}_{1,2}\mathbf{V}_{s,1,2} + \ldots + \mathbf{V}_{i,m-1,m}^{\top}\mathbf{L}_{m,m-1}\mathbf{V}_{s,m-1,m} \end{align*} where $\mathbf{L}_{j,k} = \text{diag} \left[\lambda_{(j)}, \lambda_{(k)} \right]$ denotes the appropriate scaling matrix. In other words, for any $(j,k)$ principal direction pairs, $\mathbf{V}_{i,j,k}^{\top} \mathbf{L}_{j,k} \mathbf{V}_{s,j,k}$ explains a proportion of the covariance $\mathbf{S}_{X_{i},X_{s}}$. Accordingly, when $\mathbf{X}_{i}$ and $\mathbf{X}_{s}$ are highly correlated, we can expect $\mathbf{V}_{i,j,k}^{\top} \mathbf{L}_{j,k} \mathbf{V}_{s,j,k}$ to be larger values. In this regard, let $\theta_{i,s,j,k}$ denote the angle between any two vectors $\mathbf{V}_{i,j,k}$ and $\mathbf{V}_{s,j,k}$, and recall that \begin{align*} \cos \theta_{i,s,j,k} &= \frac{\mathbf{V}_{i,j,k}^{\top}\mathbf{V}_{s,j,k}}{\norm{\mathbf{V}_{i,j,k}} \norm{\mathbf{V}_{s,j,k}}} \end{align*} To accommodate the use of the scaling matrices $\mathbf{L}_{j,k}$, observe that we can modify this result as follows: \begin{align} \mathbf{V}_{i,j,k}^{\top} \mathbf{L}_{j,k} \mathbf{V}_{s,j,k} = \mathbf{V}_{i,j,k}^{\top} \mathbf{L}_{j,k} \left(\mathbf{V}_{i,j,k}\mathbf{V}_{i,j,k}^{\top} \right)^{-1} \mathbf{V}_{i,j,k} \norm{\mathbf{V}_{i,j,k}} \norm{\mathbf{V}_{s,j,k}} \cos \theta_{i,s,j,k} \label{eq6} \end{align} Now, when $\theta_{i,s,j,k}$ is small, say between $0$ and $\pi/2$, we can expect $\mathbf{V}_{i,j,k}^{\top} \mathbf{L}_{j,k} \mathbf{V}_{s,j,k}$ to be large, and by extension, $\mathbf{X}_{i}$ and $\mathbf{X}_{s}$ to be more correlated. In other words, vectors that are close to one another in a loading plot indicate stronger correlations of their underlying variables. Figure 3 below gives a visual representation. <!-- :::::::::: FIGURE 3 :::::::::: --><center><a href="https://4.bp.blogspot.com/-UoZIl4Zcy-M/W8TwEqX3v6I/AAAAAAAAAlQ/JOSR21yYSJos50X0obVJfctkd5fPwnhngCLcBGAs/s1600/pcacorr.jpg"><img src="https://4.bp.blogspot.com/-UoZIl4Zcy-M/W8TwEqX3v6I/AAAAAAAAAlQ/JOSR21yYSJos50X0obVJfctkd5fPwnhngCLcBGAs/s1600/pcacorr.jpg" title="" width="640" height="auto" /></a><br /><br /></center><!-- :::::::::: FIGURE 3 :::::::::: --> It is important to realize here that since $\theta_{i,s,j,k}$ is in fact the angle between $\mathbf{V}_{i,j,k}$ and $\mathbf{V}_{s,j,k}$, the interpretation of how exhibitive $\theta_{i,s,j,k}$ is of the underlying correlation $\mathbf{S}_{X_{i}, X_{s}}$ is made more complicated by the presence of $\mathbf{L}_{j,k}$ in equation (\ref{eq6}). Accordingly, to ease interpretation, the vectors $\mathbf{V}_{i,j,k}$ are sometimes scaled appropriately, or <i>loaded</i> with scaling information, leading to the term <i>loadings</i>. In this regard, consider the vectors $\widetilde{\mathbf{V}}_{i,j,k} = \mathbf{V}_{i,j,k} \mathbf{L}_{j,k}^{1/2}$. Here, loading is done via $\mathbf{L}_{j,k}^{1/2}$, and we have: $$\mathbf{S}_{X_{i}, X_{s}} = \widetilde{\mathbf{V}}_{i,1,2}^{\top}\widetilde{\mathbf{V}}_{s,1,2} + \ldots + \widetilde{\mathbf{V}}_{i,m-1,m}^{\top}\widetilde{\mathbf{V}}_{s,m-1,m}$$ and $$\widetilde{\mathbf{V}}_{i,j,k}^{\top}\widetilde{\mathbf{V}}_{s,j,k} = \norm{\widetilde{\mathbf{V}}_{i,j,k}} \norm{\widetilde{\mathbf{V}}_{s,j,k}} \cos \widetilde{\theta}_{i,s,j,k}$$ As such, $\widetilde{\theta}_{i,s,j,k}$ more closely exhibits the true angle between $\mathbf{X}_{i}$ and $\mathbf{X}_{s}$ than $\theta_{i,s,j,k}$, and loading plots using $\widetilde{\mathbf{V}}_{i,j,k}$ tend to be more exhibitive of the underlying correlations $\mathbf{S}_{X_{i}, X_{s}}$ than those based on $\mathbf{V}_{i,j,k}$. Of course, one does not have to resort to the use of $\mathbf{L}_{j,k}^{1/2}$ as the loading matrix. In principle, one can use $\mathbf{L}_{j,k}^{\alpha}$ for some $0 \leq \alpha \leq 1$, although the underlying interpretation of what such a loading means ought to be understood first.</br></br> Of course, it is not difficult to see that $\widetilde{\mathbf{V}}_{i,j,k} = \mathbf{V}_{i,j,k} \mathbf{L}_{j,k}^{\alpha}$ is in fact the $i^{\text{th}}$ "XY"-pair between $\mathbf{E}_{j}\lambda_j^{\alpha}$ and $\mathbf{E}_{k}\lambda_k^{\alpha}$. In other words, it is the $i^{\text{th}}$ "XY"-pair using the "loaded" $j^{\text{th}}$ and $k^{\text{th}}$ principal directions. Accordingly, the term <i>loading vector</i> is sometimes used to denote a loaded principal direction. In particular, the entire matrix of loading vectors $\widetilde{\mathbf{E}}_X$ can be obtained as follows: $$\widetilde{\mathbf{E}}_X = \mathbf{E}_X \mathbf{D}_X^{\alpha}$$ Figure 4 below demonstrates the impact of using a loading weight. In particular, the vectors in Figure 3 are superimposed on the set of loaded vectors where the loading factor is $\mathbf{D}_{X}^{1/2}$. Clearly, the loaded vectors are much more correlated with the general shape of the data as represented by the ellipse. <!-- :::::::::: FIGURE 4 :::::::::: --><center><a href="https://2.bp.blogspot.com/--jkDij4_4Ik/W8TwFPoDq_I/AAAAAAAAAlU/uBh58i1d_4cV_R20a6b-rzcZNkW7BI0EwCLcBGAs/s1600/pcaload.jpg"><img src="https://2.bp.blogspot.com/--jkDij4_4Ik/W8TwFPoDq_I/AAAAAAAAAlU/uBh58i1d_4cV_R20a6b-rzcZNkW7BI0EwCLcBGAs/s1600/pcaload.jpg" title="" width="640" height="auto" /></a><br /><br /></center><!-- :::::::::: FIGURE 4 :::::::::: --> <h4>Scores Plots</h4> A <i>score plot</i> across principal direction pairs $(j,k)$ is essentially a scatter plot of the principal component vector $\mathbf{Q}_{i}$ vs. $\mathbf{Q}_{j}$. In fact, it is the analogous version of the loading plot, but for observations as opposed to variables. In this regard, whereas the angle between two loading vectors is exhibitive of the underlying correlation between some variables, the distance between observations in a score plot exhibits homogeneity across observations. Accordingly, observations which tend to cluster together, tend to move together, and one typically looks to identify important clusters when conducting inference.</br></br> Recall also the expression derived in the last line of \ref{eq5}, namely, $\mathbf{S}_{X_{i}Q_{j}} = e_{i,j} \lambda_{j} = \left(e_{i,j} \lambda_{j}^{1/2}\right)\lambda_{j}^{1/2}$. Notice that the latter expression states that the correlation between the $i^{\text{th}}$ variable and the $j^{\text{th}}$ score vector is in fact a product of the $j^{\text{th}}$ element of a loaded $i^{\text{th}}$ principal direction ($i^{\text{th}}$ loading vector), and $\lambda_{j}^{1/2}$. Accordingly, in order to achieve a more natural interpretation, one can proceed in a manner analogous the the creation of loading vectors, and either scale or entirely remove the remaining scaling factor. This leads to the idea of <i>loaded score vectors</i>. In particular, using the context above, if one wishes to interpret the correlation between the $i^{\text{th}}$ variable and the $j^{\text{th}}$ score vector as just a loaded principal direction without the additional factor $\lambda_{j}^{1/2}$, then doing so is as simple as computing $$\mathbf{S}_{X_{i}Q_{j}\lambda_j^{-1/2}} = e_{i,j} \lambda_{j}^{1/2}$$ where we now interpret $Q_{j}\lambda_j^{-1/2}$ as a loaded score vector. Of course, an infinite array of such scaling options is achievable using $Q_{j}\lambda_j^{-\alpha}$, although, as before, their interpretation ought to be understood first.</br></br> <h4>Outlier Detection</h4> An important application of PCA is to <i>outlier detection</i>. The general principle exploits the first few principal directions to explain the majority of variation in the original system, and uses <i>data reconstruction</i> to generate an approximation of the original system using the first few principal components.</br></br> Formally, if we start from the matrix of all principal components $\mathbf{Q}$, it is trivial to reconstruct the original system $\mathbf{X}$ using the inverse: $$\mathbf{Q}\mathbf{E}_{X}^{\top} = \mathbf{X}\mathbf{E}_{X}\mathbf{E}_{X}^{\top} = \mathbf{X}$$ On the other hand, if we restrict our principal components to the first $r \ll m$ most principal directions, then $\widetilde{\mathbf{Q}}\widetilde{\mathbf{E}}_{X}^{\top} = \widetilde{\mathbf{X}} \approx \mathbf{X}$, where $\widetilde{\mathbf{Q}}$ and $\widetilde{\mathbf{E}}_{X}$ are respectively the matrix $\mathbf{Q}$ and $\mathbf{E}_{X}$ with the last $m - r$ columns removed, and $\approx$ denotes an approximation. Then, the difference $$\mathbf{\xi} = \widetilde{\mathbf{X}} - \mathbf{X}$$ is known as the <i>reconstruction error</i>, and if the first $r$ principal directions explain the original variation well, we can expect $\norm{\mathbf{\xi}}_{D}^{2} \approx \mathbf{0}$ where $\norm{\cdot}_{D}$ denotes some measure of distance.</br></br> We would now like to define a statistic associated with outlier identification, and as in usual regression analysis, the reconstruction error (residuals) plays a key role. In particular, we follow the contributions of Jackson and Mudholkar (1979) and define $$\mathbf{SPE} = \mathbf{\xi} \mathbf{\xi}^{\top}$$ as the <i>squared prediction error</i> most resembling the usual sum of squared residuals. Moreover, Jackson and Mudholkar (1979) show that if observations (row vectors) in $\mathbf{X}$ are independent and identically distributed, Gaussian random variables, $\mathbf{SPE}$ has the following distribution $$\mathbf{SPE} \sim \sum\limits_{j+1}^{m}\lambda_{(j)}Z_{j}^{2} \equiv \Psi(k)$$ where $\chi^{2}_{p}$ denotes the $\chi^{2}-$distribution with $p$ degrees of freedom, and $Z_{j}$ are independent $\chi^{2}_{1}$ variables. Noting that the $i^{\text{th}}$ diagonal element of $\mathbf{SPE}$, namely $\mathbf{SPE}_{ii} = \mathbf{C}_{i}^{\top} (\mathbf{SPE}) \mathbf{C}_{i}$ is associated with the $i^{\text{th}}$ observation, we can now derive a rule for outlier detection. In particular, should $\mathbf{SPE}_{ii}$, for any $i$, fall into some critical region defined by the upper $(1 - \alpha)$ percentile of $\Psi(k)$, that observation would be considered an outlier.</br></br> <h3>Closing Remarks</h3> Principal component analysis is an extremely important multivariate statistical technique that is often misunderstood and abused. The hope is that in reading this entry you will have found the intuition one often seeks in complicated subject matters, with just enough mathematical rigour to ease any serious future undertakings. In Part II of this series, we will use EViews to exhibit a PCA case study and demonstrate just how easy this is with a a few clicks.</br></br> <!-- If you would like a PDF copy of this post, please download <a href="http://www.eviews.com/blog/Images/dhpanel/dhmcstudy.prg">here</a>.</br></br>--> <hr><h3>References</h3> <table> <tr valign="top"><td align="right" class="bibtexnumber">[<a name="bai-2002">1</a>] </td><td class="bibtexitem">Jushan Bai and Serena Ng. Determining the number of factors in approximate factor models. <em>Econometrica</em>, 70(1):191--221, 2002. </td></tr> <tr valign="top"><td align="right" class="bibtexnumber">[<a name="guttman-1954">2</a>] </td><td class="bibtexitem">Louis Guttman. Some necessary conditions for common-factor analysis. <em>Psychometrika</em>, 19(2):149--161, 1954. </td></tr> <tr valign="top"><td align="right" class="bibtexnumber">[<a name="jackson-1979">3</a>] </td><td class="bibtexitem">J&nbsp;Edward Jackson and Govind&nbsp;S Mudholkar. Control procedures for residuals associated with principal component analysis. <em>Technometrics</em>, 21(3):341--349, 1979. </td></tr> <tr valign="top"><td align="right" class="bibtexnumber">[<a name="kaiser-1960">4</a>] </td><td class="bibtexitem">Henry&nbsp;F Kaiser. The application of electronic computers to factor analysis. <em>Educational and psychological measurement</em>, 20(1):141--151, 1960. </td></tr> <tr valign="top"><td align="right" class="bibtexnumber">[<a name="kaiser-1961">5</a>] </td><td class="bibtexitem">Henry&nbsp;F Kaiser. A note on guttman's lower bound for the number of common factors 1. <em>British Journal of Statistical Psychology</em>, 14(1):1--2, 1961. </td></tr></table> </span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-76068721939620945812018-09-19T15:30:00.002-07:002018-09-19T15:30:41.836-07:00Dissecting the business cycle and the BBQ add-in<i>Authors and guest blog by Davaajargal Luvsannyam and Khuslen Batmunkh</i><br /><i><br /></i>Dating of business cycle is a very crucial for policy makers and businesses. Business cycle is the upward and downward trend of the production or business. Especially macro business cycle, which represents the general economic prospects, plays important role for policy and management decisions. For instance, when the economy is in downtrend companies tend to act more conservative. In contrast, when the economy is in uptrend companies tend to act more aggressive with the purpose of enhancing their market share. Keynesian business cycle theory suggests that business cycle is an important indicator for monetary policy which is able to stabilize the fluctuations of the economy. Therefore accurate dating of business cycle can be fundamental to efficient and practical policy decisions.<br /><br /><a name='more'></a>In the academic study, the dating process of the business cycle has been changed from a graphical orientation towards quantitative measures extracted from parametric models. For instance, Burns and Mitchell (1946) explained the main concepts of the business cycle and introduced a graphical (classical) model that aims to calculate the peak and trough of the cycle. While Cooley and Prescott (1995) started to calculate the cycle by using the variable moments of the parametric (detrend) models.<br /><br />Burns and Mitchell define that business cycle is a pattern seen in any series, <i>Y<span style="font-size: xx-small;">t </span></i>, taken&nbsp;to represent aggregate economic activity. In the process of defining a cycle, we usually use the logarithm of series&nbsp;<i>Y<span style="font-size: xx-small;">t .&nbsp;</span></i>Business cycles are identified as having four distinct phases: trough, expansion, peak, and contraction (Figure 1).<br /><br /><b>Figure 1. Business Cycle</b><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-94DmCoHIAyk/W6JiBuVIB6I/AAAAAAAAAik/MbSe4UfmCR0L9B-5y3TK0HVGFv3BU3YGwCLcBGAs/s1600/Figure%2B1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="281" data-original-width="397" src="https://1.bp.blogspot.com/-94DmCoHIAyk/W6JiBuVIB6I/AAAAAAAAAik/MbSe4UfmCR0L9B-5y3TK0HVGFv3BU3YGwCLcBGAs/s1600/Figure%2B1.png" /></a></div><div class="separator" style="clear: both; text-align: center;"></div><br />These are the characteristics of a cycle. Peak (A) is the turning point when the expansion transitions into the contraction phase. Trough (C) is the turning point when the contraction transitions into the expansion phase.&nbsp; Duration (AB length) is the number of quarters between peak and trough. Amplitude (BC length) is the height of differences between peak and trough.<br /><br /><b>Figure 2. Illustration of the Contraction Phase</b><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-5Pr2gVYOMcE/W6JiOaDtGBI/AAAAAAAAAio/MpR2rWcN7fAqwIELDtqn-XKbrH0FmZlagCLcBGAs/s1600/Figure%2B2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="324" data-original-width="462" src="https://4.bp.blogspot.com/-5Pr2gVYOMcE/W6JiOaDtGBI/AAAAAAAAAio/MpR2rWcN7fAqwIELDtqn-XKbrH0FmZlagCLcBGAs/s1600/Figure%2B2.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">The EViews add-in “BBQ” implements the methodology outlined in Harding and Pagan (2002). Harding and Pagan (2002) chose three countries, the US, the UK and Australia and established turning points for each country by using Bry-Broschan algorithm.&nbsp; This algorithm performs the following three steps.</div><div class="separator" style="clear: both; text-align: left;"></div><ol><li>Estimation of the possible turning points, i.e. the troughs and peaks in a series.</li><li>A technique for alternating the troughs and the peaks.</li><li>A set of rules that meet pre-determined criteria of the duration and amplitudes of phases and complete cycles after step 1 and 2.</li></ol><br />We will replicate the result of Table 1 of Harding and Pagan (2002).&nbsp; The example program file (bbq_ex1.prg) will generate&nbsp; the result.&nbsp; First we need to open the data file named as hpagan.wf1.<br /><br /><span style="font-family: Courier New, Courier, monospace;">wfopen hpagan.wf1</span><br /><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-yzGqyFRT-E4/W6LM2f-sVqI/AAAAAAAAAjE/I45lORKfAS8w-INm8YvnMFQvwE1gfws8QCLcBGAs/s1600/wf1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="459" data-original-width="508" height="360" src="https://4.bp.blogspot.com/-yzGqyFRT-E4/W6LM2f-sVqI/AAAAAAAAAjE/I45lORKfAS8w-INm8YvnMFQvwE1gfws8QCLcBGAs/s400/wf1.png" width="400" /></a></div><div><br /></div><div><br /></div><div><div>Data of hpagan.wf1 is quaterly real GDP of the three countries. The sample size for the US is 1947q1 to 1997q1, for the UK 1955q1 to 1997q1 and for Australia 1959q1 to 1997q1.</div><div><br /></div><div>Next we take the logarithm of series us, uk and aust.&nbsp;</div></div><div><br /></div><div><div><span style="font-family: Courier New, Courier, monospace;">series lus=log(us)</span></div><div><span style="font-family: Courier New, Courier, monospace;">series luk=log(uk)</span></div><div><span style="font-family: Courier New, Courier, monospace;">series laust=log(aust)</span></div><div><br /></div><div>Then we apply the bbq add-in to each series. We can do this either by command line or menu driven interface.</div><div><br /></div><div><span style="font-family: Courier New, Courier, monospace;">lus.bbq(turnphase=2, phase=2, cycle=5, thresh=10.4)</span></div><div><span style="font-family: Courier New, Courier, monospace;">luk.bbq(turnphase=2, phase=2, cycle=4, thresh=10.4)</span></div><div><span style="font-family: Courier New, Courier, monospace;">laust.bbq(turnphase=2, phase=2, cycle=5, thresh=10.4)</span></div></div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-1ZcScYTyjC8/W6LNnp1XiDI/AAAAAAAAAjM/H2KQEmTtvjY-70dhM3mHVfl5vuheKacvwCLcBGAs/s1600/output.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="579" data-original-width="1027" height="360" src="https://3.bp.blogspot.com/-1ZcScYTyjC8/W6LNnp1XiDI/AAAAAAAAAjM/H2KQEmTtvjY-70dhM3mHVfl5vuheKacvwCLcBGAs/s640/output.png" width="640" /></a></div><div><br /></div><div><br /></div><div>By definition, a peak happens at time t if <i>Y<span style="font-size: xx-small;">t-k</span>,…,Y<span style="font-size: xx-small;">t-k+1</span>&nbsp;</i> &lt; <i>Y<span style="font-size: xx-small;">t</span>&nbsp;</i> &gt;<i> Y<span style="font-size: xx-small;">t+1</span>,…,Y<span style="font-size: xx-small;">t+k</span></i>&nbsp; . <i>k</i> needs to be set for example <i>k</i>=2 for quarterly data, <i>k</i>=5 for monthly data and <i>k</i>=1 for yearly data. <i>k</i> is called the symmetric window parameter (turn phase).</div><div><div><br /></div><div>Other restrictions are often imposed on the phases. Minimum 2 quarters for expansions and contractions are often applied, in line with the rules used by NBER when dating these phases. This is the minimum phase. A complete cycle length (contraction plus expansion duration) of five quarters is also common for quarterly data. This is the minimum cycle. Finally, it may sometimes be desirable to overrule the minimum phase restriction. For example, if the fall in a series is very large one might allow the contraction to be quite short. The parameter controlling this is threshold (thresh).</div><div><br /></div><div>Also the add-in produces dummy variables for expansions and contractions (state, state1 and state2)&nbsp;&nbsp;</div><div>Alternatively you can implement the BBQ add-in by menu driven interface. In order to do so first open the series, i.e lus. Then go to proc/add-ins menu and choose <i>Bry-Broschan-Pagan-Harding BC dating</i> menu.</div></div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-e1l95Dt0bEs/W6LN_DL_aJI/AAAAAAAAAjU/l2I9n1K1gEY-C1x85k7G02y5Iu6i5eJtwCLcBGAs/s1600/dlg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="260" data-original-width="224" src="https://1.bp.blogspot.com/-e1l95Dt0bEs/W6LN_DL_aJI/AAAAAAAAAjU/l2I9n1K1gEY-C1x85k7G02y5Iu6i5eJtwCLcBGAs/s1600/dlg.png" /></a></div><div><br /></div><div><br /></div><div><div><i><b>References:</b></i></div><div>Bry and Boschan (1971). "<i>Cyclical Analysis of Time Series: Selected Procedures and Computer Programs</i>", NBER, New York.</div><div>Burns, A., Mitchell, W. C. (1946). "<i>Measuring Business Cycles (Vol. 2)</i>." New York, NY: National Bureau of Economic Research</div><div>Cooley and Prescott (1995) ʺ<i>Economic Growth and Business Cycles</i>ʺ Frontiers of Business Cycle Research, ed. Thomas F. Cooley, Princeton University Press, 1‐38.</div><div>Pagan and Harding (2002) "<i>Dissecting the cycle: a methodological investigation</i>", Journal of Monetary Economics, Volume 49, Issue 2, 365-381.</div></div><div><br /></div><div><br /></div><div><br /></div>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com7tag:blogger.com,1999:blog-6883247404678549489.post-44592266665213344612018-08-20T12:58:00.004-07:002018-08-20T12:58:41.279-07:00Using Facebook Likes and Google Trends data to forecast tourism<span style="font-family: Verdana, sans-serif;"><span style="font-size: 11pt; line-height: 107%;">This post is guest authored by Ulrich Gunter, Irem Önder, Stefan Gindl, all from MODUL University Vienna, and edited by the EViews team.&nbsp;</span><span style="font-size: x-small;"><span style="line-height: 107%;">&nbsp;</span>(Note: all images on this post are for illustrative purposes only; are not taken from the published article and do not represent the exact analysis performed for the article).&nbsp;</span></span><br /><span style="font-size: 11pt; line-height: 107%;"><span style="font-family: Verdana, sans-serif;"><br /></span></span><span style="font-family: Verdana, sans-serif;"><span style="font-size: 14.6667px;">A recent article, "<a href="http://journals.sagepub.com/doi/abs/10.1177/1354816618793765?journalCode=teua" target="_blank"><i style="font-weight: bold;">Exploring the predictive ability of LIKES of posts on the Facebook pages of four major city DMOs</i><b style="font-style: italic;">&nbsp;in Austria</b></a>"</span><span style="font-size: 14.6667px;">&nbsp;in the scholarly journal </span><a href="http://journals.sagepub.com/doi/abs/10.1177/1354816618793765?journalCode=teua" style="font-size: 14.6667px;" target="_blank">Tourism Economics </a><span style="font-size: 14.6667px;">investigates the predictive ability of Facebook “likes” and Google Trends data on tourist arrivals in four major Austrian cities.&nbsp; The use of online “big data” to perform short term forecasts or nowcasts is becoming increasingly important across all branches of economic study, but is particularly powerful in tourism economics.</span></span><br /><a name='more'></a><br /><span style="font-family: Verdana, sans-serif;"><span style="font-size: 14.6667px; line-height: 107%;"></span></span><br /><span style="font-size: 14.6667px;"><span style="font-family: Verdana, sans-serif;">A quick graph of Google Trends data for the Austrian city of Salzburg compared with tourist arrivals to the same city shows an obvious correlation:</span></span><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-hF6ESUbGSbo/W3sBk8DTmOI/AAAAAAAAAhM/OOJUNMGV4VoL2SfvMbR3iSlQzLw2cU-PwCLcBGAs/s1600/Salzburg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="666" data-original-width="910" height="467" src="https://4.bp.blogspot.com/-hF6ESUbGSbo/W3sBk8DTmOI/AAAAAAAAAhM/OOJUNMGV4VoL2SfvMbR3iSlQzLw2cU-PwCLcBGAs/s640/Salzburg.png" width="640" /></a></div><span style="font-family: Calibri, sans-serif;"><span style="font-size: 14.6667px;"><br /></span></span><div><div><span style="font-family: Verdana, sans-serif;">The article used a number of EViews’ automatic and manual forecasting techniques introduced in recent versions to take advantage of this predictive power.</span></div><div><span style="font-family: Verdana, sans-serif;">A brief outline of the steps taken to perform this analysis is as follows:</span></div></div><div class="separator" style="clear: both; text-align: center;"></div><div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><ul><li><span style="font-family: Verdana, sans-serif;">Monthly tourist arrivals for the four cities of Graz, Innsbruck, Salzburg and Vienna are obtained from the TourMIS database.</span></li><li><span style="font-family: Verdana, sans-serif;">Daily Facebook likes on each city’s official Facebook pages are obtained using Facebook’s Graph API.</span></li><li><span style="font-family: Verdana, sans-serif;">Monthly Google Trends data for each city is obtained from the Google Trends website.</span></li><li><span style="font-family: Verdana, sans-serif;">Once data was obtained it is imported into EViews, using different pages for the different frequencies.</span></li><li><span style="font-family: Verdana, sans-serif;">Seasonal adjustment, unit root tests (with automatic lag-selection) and frequency conversion of daily data to monthly aggregates are all performed in EViews prior to estimation and forecasting.</span></li></ul></div><div class="separator" style="clear: both; text-align: center;"></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-WBuEqSKQH-4/W3sGcPVcmtI/AAAAAAAAAiE/OmWAEbB7BDQns3IOckMxqdM1HjNBR-3yACLcBGAs/s1600/seasadjust.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="856" data-original-width="908" height="376" src="https://4.bp.blogspot.com/-WBuEqSKQH-4/W3sGcPVcmtI/AAAAAAAAAiE/OmWAEbB7BDQns3IOckMxqdM1HjNBR-3yACLcBGAs/s400/seasadjust.gif" width="400" /></a></div><div class="separator" style="clear: both; text-align: center;"></div><div><br /></div><div><ul><li><span style="font-family: Verdana, sans-serif;">Perform univariate automatic model selection on the arrivals data using automatic ARIMA estimation and automatic ETS smoothing.</span></li></ul><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-z31ORwmS9q0/W3sCm92jisI/AAAAAAAAAhc/qn8gMn1E6c0XNgWxu1sf1N3j0wJHXL5dQCLcBGAs/s1600/autoarima.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="856" data-original-width="908" height="376" src="https://1.bp.blogspot.com/-z31ORwmS9q0/W3sCm92jisI/AAAAAAAAAhc/qn8gMn1E6c0XNgWxu1sf1N3j0wJHXL5dQCLcBGAs/s400/autoarima.gif" width="400" /></a></div><div><br /></div></div><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-xO6s7uC1-QI/W3sCy-aPCwI/AAAAAAAAAhg/S-7WDdJ1RIQGAJBbbHGPuPpi1mw28UaZgCLcBGAs/s1600/ets.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="856" data-original-width="908" height="376" src="https://4.bp.blogspot.com/-xO6s7uC1-QI/W3sCy-aPCwI/AAAAAAAAAhg/S-7WDdJ1RIQGAJBbbHGPuPpi1mw28UaZgCLcBGAs/s400/ets.gif" width="400" /></a></div><div><br /></div><div><ul><li><span style="font-family: Verdana, sans-serif;">ADL models regressing tourist arrivals against monthly aggregated Facebook likes or Google Trends, or both, are estimated.&nbsp; Lag lengths are automatically selected.</span></li></ul><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-ihl9r1_-Ido/W3sC_dxkGqI/AAAAAAAAAho/BigmTK91zUUmscX1yLXMEj12AEMSVLXOwCLcBGAs/s1600/adl.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="856" data-original-width="908" height="376" src="https://4.bp.blogspot.com/-ihl9r1_-Ido/W3sC_dxkGqI/AAAAAAAAAho/BigmTK91zUUmscX1yLXMEj12AEMSVLXOwCLcBGAs/s400/adl.gif" width="400" /></a></div><div><br /></div></div><div><ul><li><span style="font-family: Verdana, sans-serif;">MIDAS regressions of monthly arrivals against daily Facebook Likes and monthly Google Trends are estimated.</span></li><li><span style="font-family: Verdana, sans-serif;">Using the EViews programming language, all the above estimation techniques are automated and used to perform recursive forecasts with horizons of 1, 2, 3, 6, 12 and 24 months.</span></li><li><span style="font-family: Verdana, sans-serif;">Finally, the EViews forecast evaluation tool is used to figure out the best-performing forecast models per city and forecast horizon (in terms of RMSE, MAE, and MAPE). The forecast encompassing test is also utilized.</span></li></ul><div><span style="font-family: Verdana, sans-serif;"><br /></span></div></div><div><span style="font-family: Verdana, sans-serif;">The results from this analysis are mixed - for two of the cities, the univariate automatic forecasting methods perform best.&nbsp; For the third city, the ADL model is best, and for the fourth city, the MIDAS approach is best.</span></div>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com0tag:blogger.com,1999:blog-6883247404678549489.post-45532550686841687642018-05-30T13:15:00.001-07:002018-05-30T13:48:23.292-07:00State Space Models with Fat-Tailed Errors and the sspacetdist add-in<script type="text/x-mathjax-config">MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> <script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\$','\$']]}}); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <i>Author and guest post by&nbsp;<a href="https://www.linkedin.com/in/eren-ocakverdi-9b673924" target="_blank">Eren Ocakverdi</a>.</i><br /><i><br /></i><br />Linear State Space Models (LSSM) provide a very useful framework for the analysis of a wide range of time series problems. For instance; linear regression, trend-cycle decomposition, smoothing, ARIMA, can all be handled practically and dynamically within this flexible system.<br />One of the assumptions behind LSSM is that the errors of the measurement/signal equation are normally distributed. In practice, however, there are situations where this may not be the case and errors follow a fat-tailed distribution. Ignoring this fact may result in wider confidence intervals for the estimated parameters or may cause outliers to bias parameter estimates.<br /><a name='more'></a><br />Treatments for heavy-tailed distributions covered in detail in Durbin and Koopman (2012), where they use mode estimates. The following is a signal plus noise model:<br />$$y_t = \omega_t + \epsilon_t$$ <br />Here, $\omega_t$ is linear Gaussian, and $\epsilon_t$ follows a Student's t-distribution. Observation variance is then given by:<br />$$A_t = \frac{(v-2)\sigma_\epsilon^2 + \tilde{\epsilon_t^2}}{(v+1)}$$ <br />The Kalman filter and smoother can be applied iteratively to obtain a new smooth estimate of $θ_t$. New values for the signal estimates $\tilde{\epsilon_t}$ are used to compute new values for $A_t$ until convergence to $\epsilon_t$.<br />This iterative procedure is not built in to EViews, but there is no an add-in, <a href="http://eviews.com/Addins/sspacetdist.aipz">sspacetdist</a>, that allows it. The add-in implements Mean Absolute Percentage Error (MAPE) as the preferred performance metric for convergence.<br />As an example, Durbin and Koopman (2012) analyze the logged quartely demand for gas in the UK from 1960 to 1986 (<a href="http://eviews.com/blog/sspacetdist/gas_data.wf1">gas_data.wf1</a>). They use a structural time series model of the basic form:<br />$$y_t = \mu_t + \gamma_t + \epsilon_t$$ Here, $\mu_t$ is the local linear trend, $\gamma_t$ is the seasonal component and $\epsilon_t$ is the observation disturbance. We can use the SSpace object of EViews to build this framework and then estimate the model via sspacetdist add-in (<a href="http://eviews.com/blog/sspacetdist/sspacet_example1.prg">sspacet_example1.prg</a>). <br />The example program file will also generate the Fig. 14.4 on page 318 of Durbin and Koopman (2012). Upper left and right panels are the estimated seasonal components from Gaussian and Student’s t model, respectively. Lower left and right panels are the estimated irregular components of these models, respectively. <br /><div class="separator" style="clear: both; text-align: center;"></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-UvLDfrEDDYM/Ww8Nf35C6dI/AAAAAAAAAgQ/ikG6pzAzLOAjn0yDizKEvlVN8ZQRC1rrACLcBGAs/s1600/pic1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="829" data-original-width="1042" height="508" src="https://2.bp.blogspot.com/-UvLDfrEDDYM/Ww8Nf35C6dI/AAAAAAAAAgQ/ikG6pzAzLOAjn0yDizKEvlVN8ZQRC1rrACLcBGAs/s640/pic1.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"></div><br />Please note that this is an approximating model, but can still be very useful in practice. As another example, let’s simulate a two independent variables regression model with t-distributed errors:<br />$$y_t = 0.6*x_{1t} + 0.3*x_{2t} + \epsilon_t\text{, where } \epsilon_t \sim t(v=3)$$ <br />Next we estimate the parameters with both maximum likelihood and this iterative state space scheme (<a href="http://eviews.com/blog/sspacetdist/sspacet_example2.prg">sspacet_example2.prg)</a>.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-K74SXd-8IAU/Ww8M0iMvegI/AAAAAAAAAgE/jdBYOXo-RTkHsdPZyycdJtXYXzvRJxYzgCEwYBhgL/s1600/pic2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="417" data-original-width="449" height="371" src="https://2.bp.blogspot.com/-K74SXd-8IAU/Ww8M0iMvegI/AAAAAAAAAgE/jdBYOXo-RTkHsdPZyycdJtXYXzvRJxYzgCEwYBhgL/s400/pic2.png" width="400" /></a></div><br /><br /><br /><div class="MsoNormal"><span lang="TR" style="font-size: 12.0pt; line-height: 115%;">Maximum likelihood estimation can be specified within a LogL object. Estimated parameters are close to their theoretical (simulated) values as they all lie within the associated confidence interval.<o:p></o:p></span></div><div class="MsoNormal"><span lang="TR" style="font-size: 12.0pt; line-height: 115%;"><br /></span></div><div class="MsoNormal"><span lang="TR" style="font-size: 12.0pt; line-height: 115%;">In order to see how approximating state space model performs, parameters are estimated via add-in:<o:p></o:p></span></div><div class="MsoNormal"><span lang="TR" style="font-size: 12.0pt; line-height: 115%;"><br /></span></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-wfbu1e4hoc4/Ww8M0n5BNkI/AAAAAAAAAgM/9fLY14ObLFEge6DvBSB2iBlkw70obYIiQCEwYBhgL/s1600/pic3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="384" data-original-width="447" height="342" src="https://1.bp.blogspot.com/-wfbu1e4hoc4/Ww8M0n5BNkI/AAAAAAAAAgM/9fLY14ObLFEge6DvBSB2iBlkw70obYIiQCEwYBhgL/s400/pic3.png" width="400" /></a></div><div class="MsoNormal"><span lang="TR" style="font-size: 12.0pt; line-height: 115%;"><br /></span></div><br />Note that state space model must be estimated in Gaussian form first. Smoothed state values correspond to coefficients of independent variables and they are very close to the ones estimated by maximum likelihood, which is the true approach for this problem.<br /><br />As for the degrees-of-freedom parameter, a separate distribution fitting exercise on smoothed disturbances is required. Again, two values are very close (both can be rounded to 3.32).<br /><br /><i>Note: Interested reader can estimate these models assuming errors are normally distributed and see how confidence intervals of parameters change.</i><br /><i><br /></i><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-83G_szhnoS8/Ww8OCanc3dI/AAAAAAAAAgY/frhlDYXP-_selM8JubCw-Ch3pn1rRhtTwCLcBGAs/s1600/pic4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="642" data-original-width="750" height="546" src="https://3.bp.blogspot.com/-83G_szhnoS8/Ww8OCanc3dI/AAAAAAAAAgY/frhlDYXP-_selM8JubCw-Ch3pn1rRhtTwCLcBGAs/s640/pic4.png" width="640" /></a></div><i><br /></i><div><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;"><br /></div><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;"><b style="mso-bidi-font-weight: normal;"><span style="font-size: 12.0pt; line-height: 115%; mso-ansi-language: EN-US;">Reference:</span></b><span style="font-size: 12.0pt; line-height: 115%; mso-ansi-language: EN-US;"> <o:p></o:p></span></div><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;"><span style="font-size: 12.0pt; line-height: 115%; mso-ansi-language: EN-US;">Durbin, J. and Koopman, S. J., (2001). <i style="mso-bidi-font-style: normal;">Time Series Analysis by State Space Methods</i>, 2<sup>nd</sup> ed., Oxford University Press.<o:p></o:p></span></div></div>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com1tag:blogger.com,1999:blog-6883247404678549489.post-69461458319018851142017-10-17T14:30:00.000-07:002017-10-17T15:27:52.289-07:0010+ New Features Added to EViews 10<span style="font-family: &quot;verdana&quot; , sans-serif;"><b>EViews 10+</b>&nbsp;is a free update to EViews 10, and introduces a number of new features, including:</span><br /><ul><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Chow-Lin, Denton and Litterman frequency conversion with multiple indicator series.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Model dependency graphs.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">US Bureau of Labor Statistics (BLS) data connectivity.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Introduction of the X-13 Force option for forcing annual totals.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">Expansion of the EViews 10 snapshot system to program files.</span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;">A new help command.</span></li></ul><span style="font-family: &quot;verdana&quot; , sans-serif;">All current EViews 10 users can receive the following new features. To update your copy of EViews 10, simply use the built in update feature (<i>Help-&gt;EViews Update</i>), or manually <a href="http://www.eviews.com/download/download.shtml" target="_blank">download the latest EViews 10 patch</a>.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"></span><br /><a name='more'></a><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><br /><div class="MsoNormal"><div class="MsoNormal"><h3><b><span style="font-family: &quot;verdana&quot; , sans-serif;">1) Chow-Lin, Denton and Litterman Frequency Conversion with Multiple Indicators</span></b></h3></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">EViews’ Chow-Lin, Denton and Litterman frequency conversion methods have been expanded to allow multiple indicator series giving greater flexibility and accuracy when interpolating high frequency data.</span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><br /><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">The purpose of Chow-Lin interpolation is to use regression to combine higher-frequency series with a single lower-frequency series.&nbsp; The result is a new high-frequency series that is related to both.&nbsp; Previously, EViews allowed you to create a new series from a single higher-frequency series and a lower-frequency series; the update now allows you to relate multiple series to a lower-frequency series.&nbsp; This will be useful for people who want to use multiple inputs (for example, they believe that the combination of several series is better at prediction that a single series) in their interpolation.<o:p></o:p></span></div></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><i>See a complete list of&nbsp;<b><a href="http://www.eviews.com/EViews10/ev10data_n.html" target="_blank">Data Handling</a></b>&nbsp;features added in EViews 10.</i></span></div><br /><h3><b><span style="font-family: &quot;verdana&quot; , sans-serif;">2) Model Dependency Graph</span></b></h3></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">We’ve developed a new way to graphically view the relationship between variables in your model. Colour coding is used to depict the dynamics in the model, and you can zoom and highlight variables for even greater clarity.</span></div><div class="separator" style="clear: both; text-align: center;"><a href="http://www.eviews.com/EViews10/images/DepGraph.gif" target="_blank"><span style="font-family: &quot;verdana&quot; , sans-serif; margin-left: 1em; margin-right: 1em;"></span><img border="0" src="http://www.eviews.com/EViews10/images/DepGraph.gif" data-original-height="912" data-original-width="1132" height="512" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="text-align: center;">Many central banks and large corporations around the world use EViews to build macroeconomic models, and the EViews </span><i style="text-align: center;">model</i><span style="text-align: center;"> object is at the heart of the modelling experience inside EViews.</span></span><br /><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><o:p></o:p></span></div><div class="MsoNormal"></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Whilst EViews has always provided a powerful interface for creating, editing and solving these models. However, it can be difficult for the modeller to explain his work to colleagues and clients. The new <b>dependency graph</b> provides a simple visual guide to how the relationships in the model are structured, allowing demonstration of the structure of the model.<o:p></o:p></span><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.eviews.com/EViews10/images/DepGraph2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://www.eviews.com/EViews10/images/DepGraph2.png" height="640" width="592" /></a></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">We plan on improving and adding to the dependency graph feature over the next few releases.&nbsp; If you have any suggestions or requests for the graph (or any other aspect of EViews!) please <a href="mailto:support@eviews.com" target="_blank">contact us</a>.</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><i>See a complete list of <b><a href="http://www.eviews.com/EViews10/ev10ecdiag_n.html" target="_blank">Testing and Diagnostics</a></b> features added in EViews 10.</i></span></div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="MsoNormal"><h3><b><span style="font-family: &quot;verdana&quot; , sans-serif;">3) Bureau of Labor Statistics (BLS) Data</span></b></h3></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">EViews can now connect to the United States Bureau of Labor Statistics’ API to natively fetch data directly from the BLS into EViews.</span></div><div class="MsoNormal"><div class="separator" style="clear: both; text-align: center;"><span style="font-family: &quot;verdana&quot; , sans-serif; margin-left: 1em; margin-right: 1em;"><a href="http://www.eviews.com/EViews10/images/BLS.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://www.eviews.com/EViews10/images/BLS.gif" data-original-height="972" data-original-width="1212" height="513" width="640" /></a></span></div><div class="separator" style="clear: both; text-align: center;"><br /></div><span style="font-family: &quot;verdana&quot; , sans-serif;">The US BLS is an important statistical agency, collecting and producing data on labor economics, including vital macroeconomic statistics such as prices (CPI), employment and unemployment, and salary data for both the United States in total, as well as regional aggregates.</span></div><div class="separator" style="clear: both; text-align: center;"></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">The BLS data is also available in other database sources that EViews supports, such as FRED database. However, adding BLS as a direct data source allows for a quicker data retrieval.<o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><i>See a complete list of&nbsp;<b><a href="http://www.eviews.com/EViews10/ev10data_n.html" target="_blank">Data Handling</a></b>&nbsp;features added in EViews 10.</i></span><br /><br /><h3><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><b><span style="font-family: &quot;verdana&quot; , sans-serif;">4) X-13 Force Option</span></b></h3><br /><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">EViews’ implementation of the U.S. Census Bureau’s X-13 seasonal adjustment package has been extended to give an interface to the Force specification of X-13, which allows you to seasonally adjust the data, forcing the annual totals to remain at the pre-adjusted levels.</span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><o:p></o:p></span></div><div class="MsoNormal"><div class="separator" style="clear: both; text-align: center;"></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Although use of the Force option was possible in previous versions of EViews, EViews 10+ provides a new interface to the option, making its use even easier.</span></div><div class="separator" style="clear: both; text-align: center;"></div><div class="MsoNormal"><div class="separator" style="clear: both; text-align: center;"><span style="font-family: &quot;verdana&quot; , sans-serif; margin-left: 1em; margin-right: 1em;"></span></div></div><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-EszteLOGzgg/WeU6bdQ1UGI/AAAAAAAAAeg/aLxO8Ggpzt4ZgegbwTUFK5qj6NiA8xMZwCEwYBhgL/s1600/X13Force.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="468" data-original-width="613" height="303" src="https://3.bp.blogspot.com/-EszteLOGzgg/WeU6bdQ1UGI/AAAAAAAAAeg/aLxO8Ggpzt4ZgegbwTUFK5qj6NiA8xMZwCEwYBhgL/s400/X13Force.png" width="400" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><span style="font-family: &quot;verdana&quot; , sans-serif;">Many economic time series have seasonal cycles; consumption or expenditures increase and decrease at certain times of the year. Most official statistics are seasonally adjusted to remove these cycles to allow analysis of the underlying trends in the data excluding the seasonality.</span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">X-13 has become the de-facto standard method of seasonally adjusting monthly and quarterly time series data within the United States and many other countries, and many agencies use the Force option within X-13 as a method of ensuring the adjusted data lines up with the original raw data. The inclusion of this option to the EViews X-13 interface allows easy access to this popular feature.<o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><i>See a complete list of&nbsp;<b><a href="http://www.eviews.com/EViews10/ev10eccomp_n.html" target="_blank">Computation</a>&nbsp;</b>features<b>&nbsp;</b>added in EViews 10, including other seasonal adjustment routines.</i></span><br /><br /></div><div class="MsoNormal"><h3><span style="font-family: &quot;verdana&quot; , sans-serif;"><b><span style="font-family: &quot;verdana&quot; , sans-serif;">5) Program Snapshots</span></b></span></h3></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">EViews 10 introduced the popular workfile snapshot system, allowing both manual and automatic backup, archiving and management of workfiles. EViews 10+ expands this system to EViews program files. You can manually create a snapshot of your EViews program, or let EViews automatically create backups at specified time intervals. Once snapshots have been made you can compare the current version of your program with its snapshots, quickly viewing the differences between the two, and reverting to a previous state if required.</span><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-O3iiC8T75HY/WeU6JuTsmEI/AAAAAAAAAec/uRFrxOcZC34kyK7qClUgvaJX6rw6ypIqQCLcBGAs/s1600/ProgSnap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="931" data-original-width="1233" height="481" src="https://2.bp.blogspot.com/-O3iiC8T75HY/WeU6JuTsmEI/AAAAAAAAAec/uRFrxOcZC34kyK7qClUgvaJX6rw6ypIqQCLcBGAs/s640/ProgSnap.png" width="640" /></a></div><br /><div style="text-align: center;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><i><span style="font-family: &quot;verdana&quot; , sans-serif;">See a complete list of&nbsp;</span><b style="font-family: verdana, sans-serif;"><a href="http://www.eviews.com/EViews10/ev10general_n.html" target="_blank">EViews Interface</a>&nbsp;</b><span style="font-family: &quot;verdana&quot; , sans-serif;">features</span><b style="font-family: verdana, sans-serif;">&nbsp;</b><span style="font-family: &quot;verdana&quot; , sans-serif;">added in EViews 10.</span></i></div><div class="separator" style="clear: both; text-align: left;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><h3 style="clear: both; text-align: left;"><b style="font-family: verdana, sans-serif;">6) Help Command</b></h3><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">A new help command has been implemented which provides a quick way to access the documentation for a specific command.</span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><o:p></o:p></span></div><div class="MsoNormal"><b><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></b></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><i>See a complete list of&nbsp;<b><a href="http://www.eviews.com/EViews10/ev10general_n.html" target="_blank">EViews Interface&nbsp;</a></b>features<b>&nbsp;</b>added in EViews 10.</i></span></div><div class="MsoNormal"><br /></div>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com6tag:blogger.com,1999:blog-6883247404678549489.post-21162952198550133722017-08-08T20:20:00.001-07:002017-08-09T15:18:47.232-07:00Dumitrescu-Hurlin Panel Granger Causality Tests: A Monte Carlo Study<script type="text/x-mathjax-config">MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js","AMSsymbols.js"], }, tex2jax: { inlineMath: [['$','$'], ['$$','$$'], ['\$','\$']] }, Macros: { }, }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; , sans-serif;"> With data availability at its historical peak, time series <b>panel</b> econometrics is in the limelight. Unlike traditional panel data in which each cross section $i = 1, \ldots, N$ is associated with $t=1, \ldots, T < N$ observations, what characterizes time series panel data is that $N$ and $T$ can both be very large. Moreover, the time dimension also gives rise to temporal dynamic information and with it, the ability to test for serial correlation, unit roots, cointegration, and in this regard, also <b>Granger causality</b>.<a name='more'></a><br /><br /> Our focus in this post is on Granger causality tests; rather, on a popular panel version of the test proposed in Dumitrescu and Hurlin (2012) (DH). Below, we summarize Granger causality testing in the univariate case, follow the discussion on the panel version of the test, and close with our findings from a large Monte Carlo simulation replicating and extending the work of DH to cases which were not covered in the original article. In particular, our focus is on studying the impact on size and power when the regression lag order is misspecified relative to the lag order characterizing the true data generating process (DGP).<br /><br /> <h3>Granger Causality Tests</h3> The idea behind Granger causality is simple. Given two temporal events, $x_t$ and $y_t$, we say $x_t$ <b>Granger causes</b> $y_t$, if past information in $x_t$ <b>uniquely</b> contributes to future information in $y_t$. In other words, information in $\left\{ x_{t-1}, x_{t-2}, \ldots \right\}$ has <b>predictive</b> power for $y_t$, and knowing both $\left\{ x_{t-1}, x_{t-2}, \ldots \right\}$ and $\left\{ y_{t-1}, y_{t-2}, \ldots \right\}$ together, yields better forecasts of $y_t$ than knowing $\left\{ y_{t-1}, y_{t-2}, \ldots \right\}$ alone.<br /><br /> In the context of classical, non-panel data, testing whether $x_t$ Granger causes $y_t$ reduces to parameter significance on the lagged values of $x_t$ in the regression: \begin{align} y_t = c + \gamma_1 y_{t-1} + \gamma_2 y_{t-2} + \cdots + \gamma_p y_{t-p} + \beta_1 x_{t-1} + \beta_2 x_{t-2} + \cdots + \beta_p x_{t-p} + \epsilon_t \label{eq.1} \end{align} where $\epsilon_t$ satisfies the classical assumptions of being independent and identically distributed, the roots of the characteristic equation $1 - \gamma_1r - \gamma_2r^2 - \ldots - \gamma_p r^p = 0$ lie outside the unit circle, namely, $y_t$ is stationary, $x_t$ is stationary itself, and, $p \geq 1$. In other words, we have the following null and alternative hypothesis setup: \begin{align*} H_0: \quad &\forall k\geq 1, \quad \beta_k = 0; \quad \text{$x_t$ does not Granger cause $y_t$.}\\ H_A: \quad &\exists k\geq 1, \quad \beta_k \neq 0; \quad \text{$x_t$ does Granger cause $y_t$.} \end{align*} Although the traditional Granger causality test is only valid for stationary series, we diverge briefly to caution on cases where $x_t$ and $y_t$ may be non-stationary. In particular, whenever at least one variable in the regression above is <b>not</b> stationary, the traditional approach is no longer valid. In such cases one must resort to the approach of Toda and Yamamoto (1995). In this regard, we also emphasize that unlike non-stationary but non-cointegrated variables, which may or may not exhibit Granger causality, <b>all</b> cointegrated variables necessarily Granger cause each other in at least one direction, and possibly both. Since our friend Dave Giles has exceptional posts on the subjects <a href="http://davegiles.blogspot.no/2011/04/testing-for-granger-causality.html">here</a>, <a href="http://davegiles.blogspot.ca/2011/10/var-or-vecm-when-testing-for-granger.html">here</a>, and <a href="http://davegiles.blogspot.ca/2012/04/surplus-lag-granger-causality-testing.html">here</a>, we will not delve further and urge interested readers to refer to the material in these posts.<br /><br /> <h3>Dumitrescu-Hurlin Test: Panel Granger Causality Test</h3> Recall that time series panel data associates a cross-section $i=1, \ldots, N$ for each time observation $t=1,\ldots T$. In this regard, a natural extension of the Granger causality regression (\ref{eq.1}) to cross-sectional information, would assume the form: \begin{align} y_{i,t} = c_i + \gamma_{i,1} y_{i,t-1} + \gamma_{i,2} y_{i,t-2} + \cdots + \gamma_{i,p} y_{i,t-p} + \beta_{i,1} x_{i,t-1} + \beta_{i,2} x_{i,t-2} + \cdots + \beta_{i,p} x_{i,t-p} + \epsilon_{i,t} \label{eq.2} \end{align} where now, we require the roots of the characteristic equations $1 - \gamma_{i,1}r_i - \gamma_{i,2}r_i^2 - \ldots - \gamma_{i,p} r_i^p = 0$ to be outside the unit circle for all $i=1,\ldots, N$, in addition to requiring stationarity from $x_{i,t}$ for all $i$. Moreover, we assume $\epsilon_{i,t}$ are independent and normally distributed across both $i$ and $t$; namely, $E(\epsilon_{i,t})=0$, $E(\epsilon_{i,t}^2)=\sigma_i^2$, and $E(\epsilon_{i,t}\epsilon_{j,s}) = 0$ for all $i\neq j$ and $s\neq t$. In other words, we exclude the possibility of cross-sectional dependence and serial correlation across $t$. While restrictive, relaxing these assumptions is still in theoretical development so we restrict ourselves to the aforementioned specification.<br /><br /> At this point, it is instructive to reflect on what the presence and absence of Granger causality in panel data actually means. In this regard, while the absence of Granger causality is as simple as requiring non-causality across <b>all</b> cross-sections simultaneously, namely: $$H_0: \quad \text{\forall k\geq 1 and \forall i,} \quad \beta_{i,k} = 0; \quad \text{x_{i,t} does not Granger cause y_{i,t}, } \forall i$$ the alternative hypothesis, namely the presence of Granger causality, is more involved. In particular, are we to assume presence of Granger causality implies causality across <b>all</b> cross sections simultaneously, namely, $$H_{A_1}: \quad \text{\forall k\geq 1, and \forall i,} \quad \beta_{i,k} \neq 0; \quad \text{x_{i,t} does Granger cause y_{i,t}, } \forall i$$ or, are we to hypothesize the presence of Granger causality as causality that is present for some proportion of the cross-sectional structure; in other words: \begin{align*} H_{A_2}: &\quad \text{$\forall k\geq 1$ and $\forall i=1, \ldots, N_1$,} \quad \beta_{i,k} = 0; \quad \text{$x_{i,t}$ does not Granger cause $y_{i,t}$, } \forall i \leq N_1\\ &\quad \text{$\forall i=N_1+1, \ldots, N$, $\exists k\geq 1$,} \quad \beta_{i,k} \neq 0; \quad \text{$x_{i,t}$ Granger cause $y_{i,t}$ for $i>N_1$.} \end{align*} where $0\leq N_1/N < 1$. Since $H_{A_1}$ is evidently restrictive, we focus here on $H_{A_2}$. In particular, the theory for a panel Granger causality test in which $H_0$ is contrasted with $H_{A_2}$ is the foundation of the popular work of Dumitrescu and Hurlin (2012). In fact, the approach taken follows closely the work of Im, Pesaran, and Shin (2003) for panel unit root tests in heterogenous panels. In particular, estimation proceeds in three steps: <ol> <li> For each $i$ and $t=1, \ldots, T$, estimate the regression in (\ref{eq.2}) using standard OLS. </li></br> <li> For each $i$, using the estimates in Step 1, conduct a Wald test for the hypothesis $\beta_{i,k}=0$ for all $k=1, \ldots, p$, and save this value as $W_{i,T}$. </li></br> <li> Using the $N$ statistics $W_{i,T}$ from Step 2, form the aggregate panel version of the statistic as: \begin{align} W_{N,T} = \frac{1}{N}\sum_{i=1}^{N}W_{i,T} \label{eq.3} \end{align} </li></br></ol> It is important to remark here that in steps 1 and 2, although one may observe $t=1, \ldots T$ values for $x_{i,t}$ and $y_{i,t}$, due to the autoregressive nature of the regression, the effective sample size will always be $t=1, \ldots, (T-p)$ to account for the fact that one needs $p$ initializing values for each of the variables.</br></br> Given the test statistic (\ref{eq.3}), DH demonstrate its limiting distribution when $T\longrightarrow \infty$ followed by $N\longrightarrow \infty$, denoted as $T,N \longrightarrow \infty$; in addition to the case where $N\longrightarrow \infty$ with $T$ fixed. The results are summarized below: \begin{align*} Z_{N,T} &= \sqrt{\frac{N}{2K}} \left(W_{N,T} - K\right) \quad \overset{d}{\underset{T,N \rightarrow \infty}\longrightarrow} \quad N(0,1)\\ \widetilde{Z}_{N} &= \sqrt{\frac{N(T-3K-5)}{2K(T-2K-3)}} \left(\left(\frac{T-3K-3}{T-3K-1}\right)W_{N,T} - K\right) \quad \overset{d}{\underset{N \rightarrow \infty}\longrightarrow} \quad N(0,1) \end{align*} provided $T > 5 + 3K$ as a necessary condition for the validity of results. The latter ensures that the OLS regression in Step 1 above is valid, by preventing situations in which there are more parameters than observations.</br></br> In either case, the results follow from classical statistical concepts and central limit theorems (CLT). In particular, in the case where $T,N \longrightarrow \infty$, observe that $W_{i,T} \overset{d}{\underset{T \rightarrow \infty}\longrightarrow} \chi^2(k)$ for every $i$. Accordingly, one is left with $N$ independent and identically distributed random variables, each with mean $K$ and variance $2K$. Thus, the classical Lindberg-Levy CLT applies, and the first limiting result follows. For the second case, DH demonstrate that when $T$ is fixed, $W_{i,T}$ represent $N$ independent random variables but each has mean $\frac{K(T-3K-1)}{T-3K-3}$ and variance $\frac{2K(T-3K-1)^2(T-2K-3)}{(T-3K-3)^2(T-3K-5)}$, and so they are not identically distributed. In this case, one can invoke the Lyapunov CLT, and the second result follows. Of course, it follows readily that as $T\longrightarrow \infty$, both limiting results coincide. We refer interested readers to the original DH article for details.<br /><br /> EViews has allowed estimation of the Dumitrescu-Hurlin test as a built in procedure since EViews 8. Dumitrescu and Hurlin have also made available a set of <a href=http://www.runmycode.org/companion/view/42>Matlab routines</a> to perform their test and a <a href=http://www.runmycode.org/companion/view/42>companion website</a>. In recent months, a Stata ado file allowing estimation of the test has also been made available. It should be noted that due to slight calculation errors in the original Matlab and Stata code, EViews results did not always match those given by Matlab and Stata. In recent months those mistakes have been fixed by the respective authors, and now both Matlab and Stata match the results produced in EViews.</br></br> In EViews, the test is virtually instant. Proceeding from an EViews workfile with a panel structure, open two variables, say $x_t$ and $y_t$ as a group, proceed to <b>View/Granger Causality</b>, select Dumitrescu Hurlin, specify the number of lags to use, namely, set $p$, and hit OK.</br></br> <center><a href="http://www.eviews.com/blog/Images/dhpanel/eviewsdh.png"><img src="http://www.eviews.com/blog/Images/dhpanel/eviewsdh.png" title="" width="640" height="auto" /></a><br /><br /></center> The output will look something like this.</br></br> <center><a href="http://www.eviews.com/blog/Images/dhpanel/eviewsdh2.png"><img src="http://www.eviews.com/blog/Images/dhpanel/eviewsdh2.png" title="" width="640" height="auto" /></a><br /><br /></center> In particular, EViews presents the global panel statistic $W_{N,T}$ as <i>W-Stat</i>, the standardized statistic $\widetilde{Z}_{N,T}$ as <i>Zbar-Stat</i>, and corresponding $p$-values based on the N$(0,1)$ limiting distribution presented in case two earlier. Notice that EViews does not present the asymptotic result $Z_{N,T}$. This is a conscious decision since we will show below that almost in all circumstances of interest, the version in which $T$ remains fixed, tends to outperform the one in which $T\longrightarrow \infty$, except for very large $T$.<br /><br /> <h3>Dumitrescu-Hurlin Test: Monte Carlo Study</h3> We close our post with findings from our extensive Monte Carlo study of the Dumitrescu and Hurlin (2012) panel Granger causality test. Although the authors conducted a simulation study of their own, we were disappointed that more emphasis was not placed on the impact of incorrectly specifying the lag order $p$ in the Granger causality regression (\ref{eq.2}). In this regard, we wrote an EViews <a href="http://www.eviews.com/blog/Images/dhpanel/dhmcstudy.prg">program</a> to study both size and power under the following configurations: <ul> <li> Monte Carlo replications: $5000$ </li></br> <li> Sample sizes considered: $T=11,20,50,100,250$ </li></br> <li> Cross-sections considered: $N=1,5,10,25,50$ </li></br> <li> Regression lags considered: $p=1, \ldots, 7$ </li></br> <li> Hypothesis configurations (includes $H_0$): $N_1/N = 0, 25, 50, 75, 1$ </li></br> <li> Statistics Used: $Z_{N,T}$ and $\widetilde{Z}_{N}$ </li></br></ul> The study uses the same Monte Carlo framework proposed in Dumitrescu and Hurlin (2012). In particular, data is generated according to $H_0$ and $H_{A_2}$ for the regression equation (\ref{eq.2}), followed by estimation in which lag specifications may or may not coincide with the lag structure underlying the true DGP. Moreover, whereas each of the configurations above is available from the study, we isolate a few scenarios to illustrate our main findings: <ul> <li> First, both size and power drastically improve with increased sample size $T$, for all possible configurations. This effect is evidently more pronounced using the asymptotic statistic $Z_{N,T}$ since $\widetilde{Z}_{N}$ a priori accounts for the finiteness of $T$. </li></br> <center><a href="http://www.eviews.com/blog/Images/dhpanel/globjvstasymp.png"><img src="http://www.eviews.com/blog/Images/dhpanel/globjvstasymp.png" title="" width="640" height="auto" /></a><br /><br /></center> <center><a href="http://www.eviews.com/blog/Images/dhpanel/globjvstfinite.png"><img src="http://www.eviews.com/blog/Images/dhpanel/globjvstfinite.png" title="" width="640" height="auto" /></a><br /><br /></center> <li> Second, for each lag selection $p$ and cross-section specification $N$ (with the exception of N=1), size improves as $N$ decreases, whereas power improves as $N$ increases. On the other hand, the improvement in power due to increasing $N$ can be drastically more pronounced and varied relative to the decrease in size from the same effect. This effect is much less pronounced for size, and much more pronounced for power when considering the $\widetilde{Z}_N$ statistic. </li></br> <center><a href="http://www.eviews.com/blog/Images/dhpanel/globjvscasymp.png"><img src="http://www.eviews.com/blog/Images/dhpanel/globjvscasymp.png" title="" width="640" height="auto" /></a><br /><br /></center> <center><a href="http://www.eviews.com/blog/Images/dhpanel/globjvscfinite.png"><img src="http://www.eviews.com/blog/Images/dhpanel/globjvscfinite.png" title="" width="640" height="auto" /></a><br /><br /></center> <li> Lastly, the sensitivity of the test to misspecification of the regression lag length $p$ can be severe! In fact, our results show that size distortion is smallest with $p=1$, regardless of what the true underlying DGP is. While particularly evident in the case of the $Z_{N,T}$ statistic, the effect is somewhat less pronounced for the $\widetilde{Z}_N$ version of the test. In contrast, the test can be grossly underpowered whenever the regression lag $p$ deviates from the lag structure characterizing the true DGP. In particular, if $k$ is the number of lags in the true DGP, and $p$ is the number of regression lags selected, the test is severely underpowered for all $p < k$ and improves as $p$ approaches $k$, although if $p > k$, the effect is not nearly as severe, and virtually unnoticeable. </li></br> <center><a href="http://www.eviews.com/blog/Images/dhpanel/globjvskasymp.png"><img src="http://www.eviews.com/blog/Images/dhpanel/globjvskasymp.png" title="" width="640" height="auto" /></a><br /><br /></center> <center><a href="http://www.eviews.com/blog/Images/dhpanel/globjvskfinite.png"><img src="http://www.eviews.com/blog/Images/dhpanel/globjvskfinite.png" title="" width="640" height="auto" /></a><br /><br /></center></ul> The general takeaway is this: the Dumitrescu and Hurlin (2012) test achieves best size when regression lags $p$ are smallest (regardless of the underlying true AR structure), whereas it achieves best power when $p$ matches the true AR structure, where the penalty for underspecifying $p$ can be severe. This trade off between selecting lower regression lags for size and higher for power, evidently calls for theoretical or practical guidance for correctly identifying the regression lags to be used in testing. Although Dumitrescu and Hurlin (2012) offer no such suggestion in their own paper, it is not difficult to see the potential of model selection criteria to mitigate the issue. Choosing the correct method of model selection is potentially problematic, and further simulation work demonstrating the appropriate method of model selection would be recommended. </br></br> If you would like to conduct your own simulations, you can find the entire code (mostly commented), <a href="http://www.eviews.com/blog/Images/dhpanel/dhmcstudy.prg">here</a>. </span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com7tag:blogger.com,1999:blog-6883247404678549489.post-23220471386074612142017-07-26T15:06:00.003-07:002017-07-27T09:00:45.278-07:00Hamilton’s “Why you should never use the Hodrick-Prescott Filter”<script type="text/x-mathjax-config">MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js","AMSsymbols.js"], }, tex2jax: { inlineMath: [['$','$'], ['\$','\$']] }, Macros: { }, }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <br /><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Professor James D. Hamilton requires no introduction, having been one of the most important researchers in time series econometrics for decades. <o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Over the past few years, Hamilton has been working on a <a href="https://www.nber.org/papers/w23429" target="_blank">paper</a> calling on applied economists to abandon the ubiquitous Hodrick-Prescott Filter and replace it with a much simpler method of extracting trend and cycle information from a time series.<o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">This paper has become popular, and a number of our users have asked how to replicate it in EViews. One of our users, Greg Thornton, has written an EViews <a href="http://www.eviews.com/Addins/addins.shtml" target="_blank">add-in</a> (called Hamilton) that performs Hamilton’s method.&nbsp; However, given its relative simplicity, we thought we’d use a blog post to show manual calculation of the method and replicate the results in Hamilton’s paper.<o:p></o:p></span></div><div class="MsoNormal"><br /></div><a name='more'></a><br /><h2><span style="font-family: &quot;verdana&quot; , sans-serif;">The Hodrick-Prescott Filter</span></h2><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">The HP filter is a mainstay of modern applied macroeconomic analysis. It is used extensively to isolate trend and cycle components from a time series.&nbsp; By isolating and removing the cyclical component, you are able to analyze the long-term effects of or on a variable without worrying about the impact of short term fluctuations. In macroeconomics this is especially useful since many macroeconomic variables suffer from business-cycle fluctuations.<o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Mathematically, the HP filter is a two-sided linear filter that computes the smoothed series $s$ of $y$ by minimizing the variance of $y$ around $s$, subject to a penalty that constrains the second difference of $s$. That is, the HP filter chooses $s$ to minimize:<o:p></o:p></span></div><div class="MsoNormal">$$\sum_{t=1}^T\left(y_t - s_t\right)^2 + \lambda \sum_{t=2}^{T-1}\left((s_{t+1} - s_t) - (s_t - s_{t-1})\right)^2$$</div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">The arbitrary smoothing parameter $\lambda$ controls the smoothness of the series $s$. The larger the $\lambda$, the smoother the series. As $\lambda=\infty$, $s$ approaches a linear trend.<o:p></o:p></span></div><div class="MsoNormal"><br /></div><h2><span style="font-family: &quot;verdana&quot; , sans-serif;">Hamilton’s Criticisms of the HP Filter</span></h2><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Hamilton outlines three main criticisms of the HP filter:</span></div><div class="MsoNormal"><i style="font-family: Verdana, sans-serif; text-indent: -0.25in;"><br /></i></div><div class="MsoNormal"></div><ol><li><span style="font-family: &quot;verdana&quot; , sans-serif;"><i><i style="text-indent: -0.25in;">The HP filter produces series with spurious dynamic relations that have no basis in the underlying data-generating process.</i></i></span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;"><i><i style="text-indent: -0.25in;"><i>Filtered values at the end of the sample are very different from those in the middle, and are also characterized by spurious dynamics.</i></i></i></span></li><li><span style="font-family: &quot;verdana&quot; , sans-serif;"><i><i style="text-indent: -0.25in;"><i><i>A statistical formalization of the problem typically produces values for the smoothing parameter vastly at odds with common practice.</i></i></i></i></span></li></ol><br /><h2><span style="font-family: &quot;verdana&quot; , sans-serif;">Hamilton’s Method</span></h2><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Hamilton proposes an alternative to the HP Filter that uses simple forecasts of the series to remove the cyclical nature.&nbsp; Specifically, to produce a smoothed estimate of <i>Y </i>at time <i>t</i>, we use the fitted value from a regression of Y on 4 lagged values of Y back-shifted by two years (so 8 observation in quarterly data), and a constant. Specifically:<o:p></o:p></span></div><div class="MsoNormal">$$\widetilde{y}_t = \alpha_0 + \beta_1 y_{t-8} + + \beta_2 y_{t-9} + \beta_3 y_{t-10} + \beta_4 y_{t-11}$$</div><div class="MsoNormal"><br /></div><h2><span style="font-family: &quot;verdana&quot; , sans-serif;">An Example Using EViews.</span></h2><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Professor Hamilton provides some examples using employment data, in the csv file employment.csv.&nbsp; Specifically, the file contains quarterly non-farm payroll numbers, both seasonally adjusted and non-seasonally adjusted between 1947 and 2016Q2.<o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">We can open that file in EViews simply by dragging it to EViews.&nbsp; The file doesn’t have a date format that EViews understands, so we will manually restructure the page to quarterly frequency with a start date of 1947Q1.<o:p></o:p></span></div><div class="separator" style="clear: both; text-align: center;"></div><div class="MsoNormal"><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-tVUqgIZY8f4/WXoKmKcwtuI/AAAAAAAAAbs/IOEbgK75NxoKoaOh18Iqi4NTCYc9zQ3IwCLcBGAs/s1600/importing.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="704" data-original-width="924" height="486" src="https://2.bp.blogspot.com/-tVUqgIZY8f4/WXoKmKcwtuI/AAAAAAAAAbs/IOEbgK75NxoKoaOh18Iqi4NTCYc9zQ3IwCLcBGAs/s640/importing.gif" width="640" /></a></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">We then give the two time-series names, and create growth rate series.<o:p></o:p></span></div><div class="MsoNormal"><br /></div><div class="separator" style="clear: both; text-align: center;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><a href="https://4.bp.blogspot.com/-a9AODTUmzHo/WXkL0f1cqEI/AAAAAAAAAa4/Cy5XsLO1uBIyfkEwELaKjFeCBmHJHIjdgCLcBGAs/s1600/rename.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="238" data-original-width="505" height="187" src="https://4.bp.blogspot.com/-a9AODTUmzHo/WXkL0f1cqEI/AAAAAAAAAa4/Cy5XsLO1uBIyfkEwELaKjFeCBmHJHIjdgCLcBGAs/s400/rename.png" width="400" /></a></span></div><br /><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Having created the series we’re interested in, we’ll first perform the HP filter on the seasonally adjusted series. We open the series, click on Proc-&gt;Hodrick-Prescott Filter.&nbsp; We enter names for the outputted trend and cycle series, and then click OK.<o:p></o:p></span></div><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-A7ZmqaNxlB8/WXkL4mnYxmI/AAAAAAAAAa8/Y19eObrIktAloy7wprdXasrqBVFUfLlUQCLcBGAs/s1600/hp%2Bfilter.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="704" data-original-width="924" height="484" src="https://3.bp.blogspot.com/-A7ZmqaNxlB8/WXkL4mnYxmI/AAAAAAAAAa8/Y19eObrIktAloy7wprdXasrqBVFUfLlUQCLcBGAs/s640/hp%2Bfilter.gif" width="640" /></a></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><br /></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">Now we’ll replicate Hamilton’s method.&nbsp; We first need to regress the series against four lags of itself shifted 8 periods back.&nbsp; We do this using the Quick-&gt;Estimate Equation menu, then entering the specification of </span><span style="font-family: &quot;courier new&quot; , &quot;courier&quot; , monospace;">NFP_LOG C NFP_LOG(-8 TO -11)</span><span style="font-family: &quot;verdana&quot; , sans-serif;"><o:p></o:p></span></div><div class="MsoNormal"><br /></div><div class="separator" style="clear: both; text-align: center;"><span style="font-family: &quot;verdana&quot; , sans-serif;"></span></div><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-d_cisLHWQg4/WXoL9gMYDhI/AAAAAAAAAb0/ggDFpAVcaDAsYVVWoaISmxzsi8AFd6rHACLcBGAs/s1600/Estimation.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="704" data-original-width="924" height="486" src="https://1.bp.blogspot.com/-d_cisLHWQg4/WXoL9gMYDhI/AAAAAAAAAb0/ggDFpAVcaDAsYVVWoaISmxzsi8AFd6rHACLcBGAs/s640/Estimation.gif" width="640" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"></div><br /><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">We can then view the residuals and fitted values, which correspond to the cyclical and trend components from the View menu.<o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">If we wanted to save those components, we could use the Proc-&gt;Make Resids and Proc-&gt;Forecast menu items to produce the residuals and fitted (forecasted) values.<o:p></o:p></span></div><div class="MsoNormal"><br /></div><div class="separator" style="clear: both; text-align: center;"><span style="font-family: &quot;verdana&quot; , sans-serif;"></span></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-60Fb8E687PA/WXoMjewjL1I/AAAAAAAAAb4/zyO0E42bTro_jJbJ5YqflJ0B-bVWpuWrACLcBGAs/s1600/resids.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="704" data-original-width="924" height="486" src="https://1.bp.blogspot.com/-60Fb8E687PA/WXoMjewjL1I/AAAAAAAAAb4/zyO0E42bTro_jJbJ5YqflJ0B-bVWpuWrACLcBGAs/s640/resids.gif" width="640" /></a></div><br /><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;">We’ve written a quick EViews program that automates this process for both the seasonally adjusted and non-seasonally adjusted data, and replicates Figure 5 from Hamilton’s paper.&nbsp; The program produces the following graphs, and the code is below:<o:p></o:p></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="MsoNormal"><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-GTEZYw3DgeY/WXoOHtAwA-I/AAAAAAAAAcM/rXzrGfNUvT0qIpYbqjZOoaa6yNiJZ1HzACLcBGAs/s1600/cycles.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1166" height="640" src="https://2.bp.blogspot.com/-GTEZYw3DgeY/WXoOHtAwA-I/AAAAAAAAAcM/rXzrGfNUvT0qIpYbqjZOoaa6yNiJZ1HzACLcBGAs/s640/cycles.png" width="466" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-3DrlcvKeaAI/WXoOHjs1mLI/AAAAAAAAAcQ/RUcVN2qzW3cY3bJkwoQI-riuorbPM-t_gCLcBGAs/s1600/figure%2B5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="919" data-original-width="1439" height="408" src="https://2.bp.blogspot.com/-3DrlcvKeaAI/WXoOHjs1mLI/AAAAAAAAAcQ/RUcVN2qzW3cY3bJkwoQI-riuorbPM-t_gCLcBGAs/s640/figure%2B5.png" width="640" /></a></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal"><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'open data</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">wfopen .\employment.csv<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'structure the data and rename series</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">pagestruct(freq=q, start=1947)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">d series01<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">rename series02 emp_sa<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">rename series03 emp_nsa<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'calculate transforms of series</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">series nfp_log = 100*log(emp_sa)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">series nsa_log = 100*log(emp_nsa)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="color: #009600; font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">'hp filter of employment (seasonally adjusted)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">nfp_log.hpf nfp_hptrend @ nfp_hpcycle<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'hp filter of employment (non seasonally adjusted)</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">nsa_log.hpf nsa_hptrend @ nsa_hpcycle<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'estimate employment (seasonally adjusted) regressed against constant and 4 lags of itself, offset by 8 periods.</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">equation eq1.ls nfp_log c nfp_log(-8 to -11)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'store resids as the cycle</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">eq1.makeresid nfp_cycle<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'store fitted vals as the trend</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">eq1.fit nfp_trend<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'estimate employment (non-seasonally adjusted) regressed against constant and 4 lags of itself, offset by 8 periods.</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">equation eq2.ls nsa_log c nsa_log(-8 to -11)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'store resids as the cycle</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">eq2.makeresid nsa_cycle<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'store fitted vals as the trend</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">eq2.fit nsa_trend<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'calculate 8 period differences</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">series nfp_base = nfp_log-nfp_log(-8)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">series nsa_base = nsa_log-nsa_log(-8)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'display graphs of Hamilton's method (replicate Figure 5)</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">freeze(g1) nfp_log.line<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g1.addtext(t) Employment (seasonally adjusted)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: blue; font-size: 12pt;">call</span><span style="font-size: 12pt;"> shade(g1)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">freeze(g2) nsa_log.line<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g2.addtext(t) Employment (not seasonally adjusted)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: blue; font-size: 12pt;">call</span><span style="font-size: 12pt;"> shade(g2)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">group nfps nfp_cycle nfp_base<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">freeze(g3) nfps.line<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g3.addtext(t) Cyclical components (SA)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g3.setelem(1) legend(Random Walk)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g3.setelem(2) legend(Regression)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g3.legend columns(1) position(4.67,0)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: blue; font-size: 12pt;">call</span><span style="font-size: 12pt;"> shade(g3)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">group nsas nsa_cycle nsa_base<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">freeze(g4) nsas.line<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g4.addtext(t) Cyclical components (NSA)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g4.setelem(1) legend(Random Walk)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g4.setelem(2) legend(Regression)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g4.legend columns(1) position(4.67,0)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: blue; font-size: 12pt;">call</span><span style="font-size: 12pt;"> shade(g4)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">graph g5.merge g1 g2 g3 g4<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g5.addtext(t) Figure 5 (Hamilton)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">show g5<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="color: #009600; font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">'display graphs of HP filter results compared with Hamilton's<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">group nfp_cycles nfp_cycle nfp_hpcycle<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">freeze(g6) nfp_cycles.line<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g6.addtext(t) Employment (seasonally adjusted) Cycles<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g6.setelem(1) legend(Hamilton Method)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g6.setelem(2) legend(HP Filter)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: blue; font-size: 12pt;">call</span><span style="font-size: 12pt;"> shade(g6)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">group nsa_cycles nsa_cycle nsa_hpcycle<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">freeze(g7) nsa_cycles.line<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g7.addtext(t) Employment (Non-seasonally adjusted) Cycles<span style="color: blue;"><o:p></o:p></span></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g7.setelem(1) legend(Hamilton Method)<span style="color: blue;"><o:p></o:p></span></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">g7.setelem(2) legend(HP Filter)<span style="color: blue;"><o:p></o:p></span></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: blue; font-size: 12pt;">call</span><span style="font-size: 12pt;"> shade(g7)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">graph g8.merge g6 g7<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">show g8<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'subroutine to shade graphs using Hamilton's dates (note these dates may differ slightly from the recession shading add-in available in EViews).</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: #009600; font-size: 12pt;">'Also does some minor formatting.</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: blue; font-size: 12pt;">subroutine</span><span style="font-size: 12pt;"> shade(graph g)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1948q4 1949q4<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1953q2 1954q2<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1957q3 1958q2<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1960q2 1961q1<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1969q4 1970q4<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1973q4 1975q1<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1980q1 1980q3<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1981q3 1982q4<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 1990q3 1991q1<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 2001q1 2001q4<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.draw(shade, b) 2007q4 2009q2<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.datelabel interval(year, 10, 0)<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.axis minor<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.axis(b) ticksout&nbsp; <o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.options -gridl<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-size: 12pt;"><span style="font-family: &quot;verdana&quot; , sans-serif;">&nbsp;&nbsp;&nbsp;&nbsp; g.options gridnone<o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><span style="color: blue; font-size: 12pt;">endsub</span><span style="font-size: 12pt;"><o:p></o:p></span></span></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; tab-stops: 0in 15.75pt 31.5pt 47.25pt 63.0pt 78.75pt 94.5pt 110.25pt 1.75in 141.75pt 157.5pt 173.25pt 189.0pt 204.75pt 220.5pt 236.25pt 3.5in 267.75pt 283.5pt 299.25pt 315.0pt 330.75pt 346.5pt 362.25pt 5.25in 393.75pt 409.5pt 425.25pt 441.0pt 456.75pt 472.5pt 488.25pt; text-autospace: none;"><br /></div><br /><div class="MsoNormal"><br /></div>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com16tag:blogger.com,1999:blog-6883247404678549489.post-61902301991546647302017-05-16T14:57:00.001-07:002017-05-18T08:50:07.683-07:00AutoRegressive Distributed Lag (ARDL) Estimation. Part 3 - Practice<script type="text/x-mathjax-config">MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js","AMSsymbols.js"], }, tex2jax: { inlineMath: [['$','$'], ['\$','\$']] }, Macros: { }, }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; , sans-serif;">In <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> and <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a> of this series, we discussed the theory behind ARDL and the Bounds Test for cointegration. Here, we demonstrate just how easily everything can be done in EViews 9 or higher.<br /><br /> While our two previous posts in this series have been heavily theoretically motivated, here we present a step by step procedure on how to implement <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> and <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a> in practice.<a name='more'></a><br /><br /> <ol><li> Get a feel for the nature of the data.</li><br /><li> Ensure all variables are integrated of order I$(d)$ with $d &lt; 2$.</li><br /><li> Specify how deterministics enter the ARDL model. Choose DGP $i=1,\ldots,5$ from those outlined in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> and Part2.</li><br /><li> Determine the appropriate lag structure of the model selected in Step 3.</li><br /><li> Estimate the model in Step 4 using Ordinary Least Squares (OLS).</li><br /><li> Ensure residuals from Step 5 are serially uncorrelated and homoskedastic.</li><br /><li> Perform the Bounds Test.</li><br /><li> Estimate speed of adjustment, if appropriate.</li><br /></ol> The following flow chart illustrates the procedure.<br /><br /> <center><a href="http://www.eviews.com/blog/Images/ardl/ardl_pt3_pic1.png" ><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_pic1.png" title="" width="640" height="auto" /></a></center><br /><br /> <h3>Working Example</h3>The motivation for this entry is the classical <i>term structure of interest rates</i> (TSIR) literature. In a nutshell, the TSIR postulates that there exists a relationship linking the yields on bonds of different maturities. Formally: $$R(k,t) = \frac{1}{k}\sum_{j=1}^{k}\pmb{\text{E}}_tR(1,t+j-1) + L(k,t)$$ where $\pmb{\text{E}}_t$ is the expectation operator conditional on the information at time $t$, $R(k,t)$ is the yield to maturity at time $t$ of a $k$ period pure discount bond, and $L(k,t)$ are the premia typically accounting for risk. To see that cointegration is indeed possible, repeated applications of the trick, $R(k,t) = R(k,t-1) + \Delta R(k,t)$, where $\Delta R(k,t) = R(k,t) - R(k,t-1)$, leads to the following expression: $$R(k,t) - R(1,t) = \frac{1}{k}\sum_{i=1}^{k-1}\sum_{j=1}^{i}\pmb{\text{E}}_t \Delta R(1,t+j) + L(k,t)$$ It is now evident that if $R(k,t)$ are I$(1)$ processes, $\Delta R(1,t+j)$ must be I$(0)$ processes, and the linear combination $R(k,t) - R(1,t)$ are therefore I$(0)$ processes provided $L(k,t)$ is as well. In other words, the $k$ period yield to maturity is always cointegrated with the first period yield to maturity, with cointegrating vector $(1,-1)^\top$. In fact, a little more work shows that the principle holds for the spread between any two arbitrary times $k_1$ and $k_2$. That is, \begin{align*} R(k_2,t) - R(k_1,t) &amp;= R(k_2,t) - R(1,t) + R(1,t) - R(k_1,t)\\ &amp;= \frac{1}{k_2}\sum_{i=1}^{k_2-1}\sum_{j=1}^{i}\pmb{\text{E}}_t \Delta R(1,t+j) + L(k_2,t) - \frac{1}{k_1}\sum_{i=1}^{k_1-1}\sum_{j=1}^{i}\pmb{\text{E}}_t \Delta R(1,t+j) + L(k_1,t)\\ &amp;\sim \text{I}(0) \end{align*} Now that we have established a theoretical basis for the exercise, we delve into practice with real data. In fact, we will work with Canadian maturities collected directly from the <a href="http://www5.statcan.gc.ca/cansim/home-accueil?lang=eng">Canadian Socioeconomic Database from Statistics Canada</a>, or <i>CANSIM</i> for short. In particular, we will be looking at cointegrating relationships between two types of marketable debt instruments: the yield on a <i>Treasury Bill</i>, which is a short-term (maturing at 1 month, 3 months, 6 months, and 1 year from date of issue) discounted security, and the yield on <i>Benchmark Bonds</i>, otherwise known as <i>Treasury Notes</i>, which are medium-term (maturing at 2 years, 5 years, 7 years, and 10 years from date of issue) securities with bi-yearly interest payouts. The workfile can be found <a href="http://www.eviews.com/blog/Images/ardl/ardl.example.WF1">here</a>.<br /><br /> <h3>Data Summary</h3>The first step in any empirical analysis is an overview of the data itself. In particular, the subsequent analysis makes use of data on Treasury Bill yields maturing in 1,3,6, and 12 months, appropriately named <b>TBILL</b>; in addition to using data on Benchmark Bond yields (Treasury Notes) maturing in years 2,5, and 10, appropriately named <b>BBY</b>. Consider their graphs below:<br /><br /> <center><a href="http://www.eviews.com/blog/Images/ardl/ardl_pt3_graph1.png" ><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_graph1.png" title="" width="640" height="auto"></a></center><br /><br /> Notice that each graph exhibits a structural change around June 2007, marking the beginning of the US housing crisis. We have indicated its presence using a vertical red line. We will incorporate this information into our analysis by indicating the post crisis period with the dummy variable <b>dum0708</b>. Namely, the variable assumes a value of 1 in each of the months following June 2007. Moreover, a little background research on the Central Bank of Canada (CBC) reveals that starting January 2001, the CBC would commit to a new set of transparency and inflation targeting measures to recover from the late 90's dot-com crash as well as the disinflationary period in the earlier part of that decade. For this reason, to avoid having to analyze too many policy paradigm shifts, we will only focus on data in the period after January 2001. We can achieve everything with the following set of commands: <pre>'Set sample from Jan 2001 to end.<br />smpl Jan/2001 @last<br /><br />'Create dummy for post 07/08 crisis<br />series dum0708 = @recode(@dateval("2007/06")&lt;@date,1,0)<br /></pre><br /> <h3>Testing Integration Orders</h3>We begin our analysis by ensuring that no series under consideration is integrated of order 2 or higher. To do this, we run a unit root test on the <b>first difference</b> of each series. In this case, the standard ADF test will suffice. A particularly easy way of doing this is creating a <b>group</b> object with all variables of interest, and then running a unit root test on the group, specifying that the test should be done on the individual series. In the group view then, proceed to <b>Proc/Unit Root Test...</b>, and choose the appropriate options.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_ss1.png" title="" width="auto" height="auto"><br /><br /></center> The following table illustrates the result.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table1.png" title="" width="auto" height="auto"><br /><br /></center> Notice in the lower table that the column heading <b>Prob.</b> lists the $p$-values associated with each individual series. Since the $p$-value is 0 for each of the series under consideration and the null hypothesis is a unit root, we will reject the null at all significance levels. In particular, since the test was conducted under first differences, we conclude that there are no unit roots in first differences, and so each of the series must be either I$(0)$ or I$(1)$. We can therefore proceed onto the second step. <br /><br /> <h3>Deterministic Specifications</h3>Selecting an appropriate model to fit the data is both art and science. Nevertheless, there are a few guidelines. Any model in which the series are not centered about zero will typically require a constant term, whereas any model in which the series exhibit a trend, will in general have better fit when a trend term is incorporated. Our discussion in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> and <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a> of this series discussed the possibility of selecting from five different DGP specifications, termed Case 1 through Case 5. In fact, we will consider several different model specifications with various variable combinations.<br /><br /> <ul><li> <b>Model 1:</b> The Model under consideration will look for a relationship between the 10 Year Benchmark Bond Yield and the 1 Month T-Bill. In particular, the model will <b>restrict</b> the constant to enter the cointegrating relationship, corresponding to the DGP and Regression Model specified in <b>Case 2</b> in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> and <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a>.</li><br /><br /> <li> <b>Model 2:</b> The Model under consideration will look for a relationship between the 6, 3, and 1 Month T-Bills. Here, the model will leave the constant <b>unrestricted</b>, corresponding to the DGP and Regression Model specified in <b>Case 3</b> in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> and <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a>.</li><br /><br /> <li> <b>Model 3:</b> The Model under consideration will look for a relationship between the 2 Year Benchmark Bond Yield, and the 1 Year and 1 Month T-Bills. Here, the model will again leave the constant <b>unrestricted</b>, corresponding to the DGP and Regression Model specified in <b>Case 3</b> in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> and <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a>.</li><br /></ul> We will see how to select these in EViews when we discuss estimation below. <br /><br /> <h3>Specifying ARDL Lag Structure</h3>Selecting an appropriate number of lags for the model under consideration is again, both science and art. Unless the number of lags is specified by economic theory, the econometrician has several tools at his disposal to select lag length optimally. One possibility is to select the maximal number of lags for the dependent variable, say $p$, and the maximal number of lags for each of the regressor variables, say $q$, and then run a barrage of regressions with all the different possible combinations of lags that can be formed using this specification. In particular, if there are $k$ regressors, the maximum number of combinations of the set of numbers $\{1, \ldots p\}$ and $k$ additional sets of numbers $\{0,\ldots, q\}$, is $p\times (q + 1)^k$. For instance, with EViews default values $p = q = 4$, the total number of models under consideration would be 100. The optimal combination is then set as that which minimizes some information criterion, say Akaike (AIC), Schwarz (BIC), Hannan-Quinn (HQ), or even the adjusted $R^2$. EViews offers the user an option on how to select from among these, and we will discuss this when we explore estimation next.<br /><br /> <h3>Estimation, Residual Diagnostics, Bounds Test, and Speed of Adjustment</h3>ARDL models are typically estimated using standard least squares techniques. In EViews, this implies that one can estimate ARDL models manually using an equation object with the Least Squares estimation method, or resort to the built-in equation object specialized for ARDL model estimation. We will use the latter. Open the equation dialog by selection <b>Quick/Estimate Equation</b> or by selecting <b>Object/New Object/Equation</b> and then selecting <b>ARDL</b> from the <b>Method</b> dropdown menu. Proceed by specifying each of the following:<br /><br /> <ul><li> List the relevant dynamic variables in the <b>Dynamic Specification</b> field. This is a space delimited list where the dependent variable is followed by the regressors which will form the long-run equation. Do <b>NOT</b> list variables which are not part of the long-run equation, but part of the estimated model. Those variables will be specified in the <b>Fixed Regressors</b> field below.</li><br /> <li> Specify whether <b>Automatic</b> or <b>Fixed</b> lag selection will be used. Note that even if <b>Automatic</b> lag selection is preferred, maximum lag-orders need to be specified for the dependent variable as well as the regressors. If you wish to specify how automatic selection is computed, please click on the <b>Options</b> tab and select the preferred information criterion under the <b>Model selection criteria</b> dropdown menu. Finally, note that in EViews 9, if <b>Fixed</b> lag selection is preferred, all regressors will have the same number of lags. EViews 10 will allow the user to fix lags specific to each regressor under consideration.</li><br /> <li> In the <b>Fixed Regressors</b> field, specify all variables <b>other than</b> the constant and trend, which will enter the model for estimation, but will not be a part of the long-run relationship. This list can include variables such as dummies or other exogenous variables.</li><br /> <li> In the <b>Fixed Regressors</b> field, specify how deterministic specifications enter the long-run relationship. This is a dropdown menu which corresponds to the 5 different DGP cases mentioned earlier, and explored in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> and <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a> of this series. In particular, the <b>Trend Specification</b> dropdown menu offers the following options: <ul><li> <b>None:</b> This corresponds to <b>Case 1</b> -- the no constant and trend case.</li><br /><li> <b>Rest. constant:</b> This corresponds to <b>Case 2</b> -- the restricted constant and no trend case.</li><br /><li> <b>Unrest. constant:</b> This corresponds to <b>Case 3</b> -- the unrestricted constant and no trend case.</li><br /><li> <b>Rest. linear trend:</b> This corresponds to <b>Case 4</b> -- the restricted linear trend and unrestricted constant case.</li><br /><li> <b>Unrest. constant and trend:</b> This corresponds to <b>Case 5</b> -- the unrestricted constant and unrestricted linear trend case. Note that this case will be available starting with EViews version 10.</li><br /> </ul></li></ul> We now demonstrate the above for each of the 4 models specified earlier. In all models we will use automatic lag selection and a dummy for the post-2008 housing crisis period.<br /><br /> <h4>Model 1: No Cointegrating Relationship</h4>In this model, the dependent variable is the 10 Year Benchmark Bond Yield, while the dynamic regressor is the 1 Month T-Bill. Moreover, the DGP under consideration is a restricted constant, or Case 2, and we include the variable <b>dum0708</b> as our non-dynamic regressor. We have the following output.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_ss2.png" title="" width="auto" height="auto"><br /><br /></center> We have the following output.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table2.png" title="" width="auto" height="auto"><br /><br /></center><br /><br /> To verify whether the residuals from the model are serially uncorrelated, in the estimation view, proceed to <b>View/Residual Diagnostics/Serial Correlation LM Test...</b>, and select the number of lags. In our case, we chose 2. Here's the output.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table3.png" title="" width="auto" height="auto"><br /><br /></center> Since the null hypothesis is that the residuals are <b>serially uncorrelated</b>, the $F$-statistic $p$-value of 0.7475 indicates that we will <b>fail</b> to reject this null. We therefore conclude that the residuals are serially uncorrelated.<br /><br /> Similarly, testing for residual homoskedasticity, in the estimation view, proceed to <b>View/Residual Diagnostics/Heteroskedasticity Tests...</b>, and select a type of test. In our case, we chose Breusch-Pagan-Godfrey. Here's the output.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table4.png" title="" width="auto" height="auto"><br /><br /></center> Since the null hypothesis is that the residuals are <b>homoskedastic</b>, the $F$-statistic $p$-value of 0.1198 indicates that we will <b>fail</b> to reject this null even for a significance level of 10\%. We therefore conclude that the residuals are homoskedastic at 10\% significance.<br /><br /> To test for the presence of cointegration, in the estimation view, proceed to <b>View/Coefficient Diagnostics/Long Run Form and Bounds Test</b>. Below the table of coefficient estimates, we have two additional tables presenting the error correction $EC$ term and the $F$-Bounds test. The output is below.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table5.png" title="" width="auto" height="auto"><br /><br /></center> The $F$-statistic value 2.279536 is evidently below the I$(0)$ critical value bound. Our analysis in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a> of this series indicates that we <b>fail</b> to reject the null hypothesis that there is no equilibrating relationship.<br /><br /> In fact, we can visualize the fit of the long-run equation and the dependent variable by extracting the $EC$ term and subtracting from it the dependent variable. This can be done as follows. In the estimation view, proceed to <b>Proc/Make Cointegrating Relationship</b> and save the series under a name, say <b>cointno</b>. Since the cointegrating relationship is the $EC$ term, we would like to extract just the long-run relationship. To do this, simply subtract the series <b>cointno</b> from the dependent variable. In other words, make a new series $\text{LRno} = \text{BBY10Y} - \text{cointno}$. Finally, form a group with the variables <b>BBY10Y</b> and <b>LRno</b>, and plot. We have the following output.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_graph2.png" title="" width="640" height="auto"><br /><br /></center> Clearly, there is no use in performing a regression to study the speed of adjustment.<br /><br /> <h4>Model 2: Usual Cointegrating Relationship</h4>In this model, the dependent variable is the 6 Months T-Bill, while the dynamic regressors are the 3 and 1 Month T-Bills. Moreover, the DGP under consideration specifies an unrestricted constant, or Case 3, and we include the variable <b>dum0708</b> as our non-dynamic regressor. To avoid repetition, we will not present the output, but skip immediately to verifying whether the residuals from the model are serially uncorrelated and homoskedastic. We have the following outputs.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table6.png" title="" width="auto" height="auto"><br /><br /></center> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table7.png" title="" width="auto" height="auto"><br /><br /></center> Given the $p$-values from both tests, we will <b>reject</b> the null hypothesis in both tests. Clearly, we have a problem with both serial correlation and heteroskedasticity. To solve the first problem, we will increase the number of lags for both the dependent variable and the regressors. To solve the second problem, we will use a <b>HAC</b> covariance matrix adjustment, which will correct the value of any test statistics that are computed in estimation. This can be done by going to the <b>Options</b> tab and adjusting the <b>Coefficient Covariance matrix</b> to <b>HAC (Newey-West)</b>, and setting the details in the <b>HAC Options</b>. Remember that while serial correlation can lead to biased results, heteroskedasticity simply leads to inefficient estimation. Thus, removing serial correlation is of primary importance. We do both these tasks next.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_ss3.png" title="" width="auto" height="auto"><br /><br /></center> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_ss4.png" title="" width="auto" height="auto"><br /><br /></center> We test again for the presence of serial correlation.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table8.png" title="" width="auto" height="auto"><br /><br /></center> The $F$-statistic $p$-value of 0.3676 indicates that we no longer have a problem with serial correlation.<br /><br /> To test for the presence of cointegration, we proceed again to the <b>Long Run Form and Bounds Test</b> view. We have the following output.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table9.png" title="" width="auto" height="auto"><br /><br /></center> The $F$-statistic value 9.660725 is evidently greater than the I$(1)$ critical value bound. Our analysis in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a> of this series indicates that we <b>reject</b> the null hypothesis that there is no equilibrating relationship. Moreover, since we have rejected the null and since we have <b>not</b> included a constant or trend in the cointegrating relationship, our exposition in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a> of this series indicates that we can use the $t$-Bounds Test critical values to determine which alternative emerges. In this particular case, the absolute value of the $t$-statistic is $|-5.043782| = 5.043782$, and it is greater than the absolute value of either the I$(0)$ or I$(1)$ $t$-bound. Recall that this indicates that we should <b>reject</b> the $t$-Bounds test null hypothesis, and conclude that the cointegrating relationship is either of the usual kind, or is valid but degenerate. Nevertheless, a look at the fit between the dependent variable and the equilibrating equation should lead us to believe that the relationship is indeed valid. The graph is presented below.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_graph3.png" title="" width="640" height="auto"><br /><br /></center> In this particular case, it makes sense to study the speed of adjustment equation. To view this, from the estimation output, proceed to <b>View/Coefficient Diagnostics/Long Run Form and Bounds Test</b>. We have the following output.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table10.png" title="" width="auto" height="auto"><br /><br /></center> As expected, the $EC$ term, here represented as <b>CointEq(-1)</b>, is negative with an associated coefficient estimate of $-0.544693$. This implies that about 54.47% of any movements into disequilibrium are corrected for within one period. Moreover, given the very large $t$-statistic, namely $-5.413840$, we can also conclude that the coefficient is highly significant. See <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl_8.html" target="_blank">Part 2</a> of this series for further details.<br /><br /> <h4>Model 3: Nonsensical Cointegrating Relationship</h4>In this model, the dependent variable is the 2 Year Benchmark Bond Yield, while the dynamic regressors are the 1 Year and 1 Month T-Bills. Moreover, the DGP under consideration specifies an unrestricted constant, or Case 3, and we include the variable <b>dum0708</b> as our non-dynamic regressor. To avoid repetition, we will only present tables where necessary to derive inference.<br /><br /> As usual, we first verify whether the residuals from the model are serially uncorrelated and homoskedastic. We have the following outputs.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table11.png" title="" width="auto" height="auto"><br /><br /></center> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table12.png" title="" width="auto" height="auto"><br /><br /></center> Here it is evident that we do not have a problem with serial correlation, but, our residuals are heteroskedastic. As in the previous case, we reestimate using a <b>HAC</b>-corrected covariance matrix, and then proceed to test to the <b>Long Run Form and Bounds Test</b> view. We have the following output.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_table13.png" title="" width="auto" height="auto"><br /><br /></center> The $F$-statistic value 5.322963 is large enough to reject the null hypothesis at the 5% significance level, but not necessarily lower. Furthermore, since we have <b>not</b> included a constant or trend in the cointegrating relationship, we can make use of the $t$-Bounds Test critical values to determine which alternative hypothesis emerges. Here, the absolute value of the $t$-statistic is $|-1.774930| = 1.774930$, which is less than the absolute value of either the I$(0)$ or I$(1)$ $t$-bound. Accordingly, we <b>fail</b> to reject the $t$-Bounds test null hypothesis and conclude that the cointegrating relationship is in fact nonsensical. The following is a graph of the fit between the dependent variable and the equilibrating equation.<br /><br /> <center><img src="http://www.eviews.com/blog/Images/ardl/ardl_pt3_graph4.png" title="" width="640" height="auto"><br /><br /></center> <h3>EViews Program and Files</h3>We close this series with the EViews program script that will automate most of the output we have provided above. To use the script, you will need the EViews workfile: <a href="http://www.eviews.com/blog/Images/ardl/ardl.example.WF1">ARDL.EXAMPLE.WF1</a><br /><br /> <pre><br />'---------<br />'Preliminaries<br />'---------<br /><br />'Open Workfile<br />'wfopen(type=txt) http://www5.statcan.gc.ca/cansim/results/cansim-1760043-eng-2216375457885538514.csv colhead=2 namepos=last names=(date, 'bby2y,bby5y,bby10y,tbill1m,tbill3m,tbill6m,tbill1y) skip=3<br />'pagecontract if @trend<244<br />'pagestruct @date(date)<br /><br />wfuse pathto...ardl.example.WF1<br /><br />'Set sample from Jan 2001 to end.<br />smpl Jan/2001 @last<br /><br />'Create dummy for post 07/08 crisis<br />series dum0708 = @recode(@dateval("2007/06")<@date,1,0)<br /><br />'Create Group of all Variables<br />group termstructure tbill1m tbill3m tbill6m tbill1y bby2y bby5y bby10y<br /><br />'Graph all series<br />termstructure.line(m) across(@SERIES,iscale, iscalex, nodispname, label=auto, bincount=5)<br /><br />'Do UR test on each series<br />termstructure.uroot(dif=1, adf, lagmethod=sic)<br /><br />'---------<br />'No Relationship<br />'---------<br /><br />'ARDL: 10y Bond Yields and 1 Month Tbills.<br />equation ardlno.ardl(trend=const) bby10y tbill1m @ dum0708<br /><br />'Run Residual Serial Correlation Test<br />ardlno.auto<br /><br />'Run Residual Heteroskedasticity Test<br />ardlno.hettest @regs<br /><br />'Make EC equation.<br />ardlno.makecoint cointno<br /><br />'Plot Dep. Var and LR Equation<br />group groupno bby10y (bby10y - cointno)<br />freeze(mode=overwrite, graphno) groupno.line<br />graphno.axis(l) format(suffix="%")<br />graphno.setelem(1) legend(BBY10Y: 10 Year Canadian Benchmark Bond Yields)<br />graphno.setelem(2) legend(Long run relationship (BBY10Y - COINTNO))<br />show graphno<br /><br />'---------<br />'Non Degenerate Relationship<br />'---------<br /><br />'ARDL term structure of Bond Yields. (Non-Degenerate)<br />equation ardlnondeg.ardl(deplags=6, reglags=6, trend=uconst, cov=hac, covlag=a, covinfosel=aic) tbill6m tbill3m tbill1m @ dum0708<br /><br />'Run Residual Serial Correlation Test<br />ardlnondeg.auto<br /><br />'Run Residual Heteroskedasticity Test<br />ardlnondeg.hettest @regs<br /><br />'Make EC equation.<br />ardlnondeg.makecoint cointnondeg<br /><br />'Plot Dep. Var and LR Equation<br />group groupnondeg tbill6m (tbill6m - cointnondeg)<br />groupnondeg.line<br /><br />freeze(mode=overwrite, graphnondeg) groupnondeg.line<br />graphnondeg.axis(l) format(suffix="%")<br />graphnondeg.setelem(1) legend(TBILL6M: 6 Month Canadian T-Bill Yields)<br />graphnondeg.setelem(2) legend(Long run relationship (TBILL6M - COINTNONDEG))<br />show graphnondeg<br /><br />'---------<br />'Degenerate Relationship<br />'---------<br /><br />'ARDL term structure of Bond Yields. (Degenerate)<br />equation ardldeg.ardl(trend=uconst, cov=hac, covlag=a, covinfosel=aic) bby2y tbill1y tbill1m @ dum0708<br /><br />'Run Residual Serial Correlation Test<br />ardldeg.auto<br /><br />'Run Residual Heteroskedasticity Test<br />ardldeg.hettest @regs<br /><br />'Make EC equation.<br />ardldeg.makecoint cointdeg<br /><br />'Plot Dep. Var and LR Equation<br />group groupdeg bby2y (bby2y - cointdeg)<br />freeze(mode=overwrite, graphdeg) groupdeg.line<br />graphdeg.axis(l) format(suffix="%")<br />graphdeg.setelem(1) legend(BBY2Y: 2 Year Canadian Benchmark Bond Yields)<br />graphdeg.setelem(2) legend(Long run relationship (BBY2Y - COINTDEG))<br />show graphdeg<br /></pre></span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com167tag:blogger.com,1999:blog-6883247404678549489.post-40235625416593436512017-05-08T16:48:00.000-07:002017-05-16T16:43:27.414-07:00AutoRegressive Distributed Lag (ARDL) Estimation. Part 2 - Inference<script type="text/x-mathjax-config">MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js","AMSsymbols.js"], }, tex2jax: { inlineMath: [['$','$'], ['\$','\$']] }, Macros: { }, }); </script> <script async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS_CHTML" type="text/javascript"></script> <span style="font-family: &quot;verdana&quot; , sans-serif;">This is the second part of our AutoRegressive Distributed Lag (ARDL) post. For Part 1, please go <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">here</a>, and for Part 3, please visit <a href="http://blog.eviews.com/2017/05/autoregressive-distributed-lag-ardl.html" target="_blank">here</a>. <br /><br />In this post we outline the correct theoretical underpinning of the inference behind the Bounds test for cointegration in an ARDL model. Whilst the discussion is by its nature quite technical, it is important that practitioners of the Bounds test have a grasp of the background behind its inferences.</span><br /><br /><a name='more'></a><br /><h3><span style="font-family: &quot;verdana&quot; , sans-serif;">Overview</span></h3><br /><span style="font-family: &quot;verdana&quot; , sans-serif;">While the ARDL approach to cointegration is typically considered synonymous with the Pesaran, Shin, and Smith (2001) <i>Bounds</i> test for cointegration, in this post we emphasize that correct inference is in fact&nbsp;rooted in cointegration theory. In <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> of this series, we mentioned that the ARDL framework is a one-to-one reparameterization of the conditional error correction model (ECM) representation of the underlying vector auto-regression (VAR).</span><br /><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /></span><span style="font-family: &quot;verdana&quot; , sans-serif;">Recall that a VAR is a natural extension of the univariate autoregressive model to multivariate series, and is often interpreted as an autoregressive system-of-equations regression model with multiple endogenous variables. As such, it lends itself to the analysis of simultaneous interactions between variables -- namely, their short-run dynamics, but more importantly, their long-run (equilibrating) or cointegrating behaviour. In this regard, the vector error correction model (VECM), which is a reparameterization of the VAR to isolate the equilibrating relationships, if they exist, is of central importance. Nevertheless, like the VAR, the VECM models <b>simultaneous</b> interactions among several endogenous variables. However, applications in Economics typically ask: <br /><br /> <i>How does <b>one</b> variable in the VAR behave <b>conditional</b> on a all the others, which are themselves endogenously determined, and is their any cointegrating relationship among them</i>? <br /><br /> In other words, we hope to derive a conditional ECM (CECM), which formalizes an ECM model for some variable conditional on all the others, but at the same time, isolates the cointegrating relationship among them. In this regard, we will demonstrate that the ARDL model is in fact a special case of the CECM. However, recall from <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> of this series that one of the major advantages of the ARDL model is due to its ability to estimate the long-run or cointegrating relationship. What we expound on here, is that this estimate may not always be defined or sensible, and even if it is, it may be degenerate; that is, seemingly stable in the short-run, but dissipates in the long-run. It is here where the Bounds test comes into the limelight: it is a way of statistically detecting the presence of cointegration. The advantage of the procedure is that it uses the CECM (ARDL) as a platform. Thus, in estimating the CECM (ARDL), one can simultaneously test for cointegration and estimate the equilibrating relationship. Lastly, if cointegration does exist, one can estimate and conduct inference on the speed of convergence to equilibrium. The following flow-chart summarizes the steps:<br /><br /> </span><br /><div class="separator" style="clear: both; text-align: center;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><a href="https://3.bp.blogspot.com/-96CIOY1Ykww/WREAuVVaCRI/AAAAAAAAAT0/F6WJHQnn2gYaaYl5li38BeTFh9bWo4mkQCLcB/s1600/flowchart1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-96CIOY1Ykww/WREAuVVaCRI/AAAAAAAAAT0/F6WJHQnn2gYaaYl5li38BeTFh9bWo4mkQCLcB/s640/flowchart1.png" width="480" /></a></span></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /><br /> </span><br /><h3><span style="font-family: &quot;verdana&quot; , sans-serif;">Vector Auto-regression (VAR) and the Vector Error Correction Model (VECM)</span></h3><span style="font-family: &quot;verdana&quot; , sans-serif;">Introduced to econometrics by Sims (1980), we formalize below a VAR model with $p$ lags, namely VAR$(p)$, augmented with the usual deterministic dynamics (intercept and trend). \begin{align} \pmb{\Phi}(L)(\pmb{z}_t - \pmb{\mu} - \pmb{\gamma}t) &amp;= \pmb{\epsilon}_t \notag \\ \pmb{\Phi}(L)\pmb{z}_t &amp;= \pmb{\Phi}(L)\pmb{\mu} + \pmb{\Phi}(L)\pmb{\gamma}t + \pmb{\epsilon}_t \label{eq.ardl.11} \end{align} where $\pmb{z}_t$ is a $(k+1)$-vector $(y_t,x_{1,t},\ldots, x_{k,t})^\top = (y_t,\pmb{x}^\top_t)^\top$ with $\pmb{x}_t = (x_{1,t},\ldots, x_{k,t})^\top$, $\pmb{\mu}$ and $\pmb{\gamma}$ are respectively the $(k+1)$-vectors of intercept and trend coefficients, $\pmb{\Phi}(L) = \pmb{I}_{k+1} - \sum_{i=1}^{p}\pmb{\Phi}_iL^i$ is the $(k+1)$ square matrix lag polynomial, and $\pmb{I}_{k+1}$ is the identity matrix of dimension $(k+1)$, and $\pmb{\epsilon}_t = (\epsilon_{yt}, \pmb{\epsilon}_{xt}^\top)$ is the vector of innovations. We complete the setup following assumptions:<br /><br /> </span><br /><h4><span style="font-family: &quot;verdana&quot; , sans-serif;">Assumption 1:</span></h4><span style="font-family: &quot;verdana&quot; , sans-serif;"><b>Individual Variables can be I$(0)$ or I$(1)$</b>: The roots of $\det\left(\pmb{\Phi}(z)\right) = \det\left(\pmb{I}_{k+1} - \sum_{i=1}^{p}\pmb{\Phi}_iz^i\right) = 0$ satisfy either $|z|&gt;1$ or $z=1$. <br /><br /> </span><br /><h4><span style="font-family: &quot;verdana&quot; , sans-serif;">Assumption 2:</span></h4><span style="font-family: &quot;verdana&quot; , sans-serif;"><b>Variables are Correlated</b>: The $(k+1)$-vector error process $\pmb{\epsilon}_t \sim N(\pmb{0}, \pmb{\Omega})$ with $\pmb{\Omega}$ positive definite. <br /><br /> Notice that Assumption 1 is the multivariate analogue of assumptions typically made for univariate AR$(p)$ processes. The assumption simply restricts $\pmb{z}_t$ to have at <b>most</b> one unit root in each of the series, and prevents the occurrence of seasonal and explosive roots. This allows $\pmb{z}_t$ to contain any combination of purely I$(1)$, purely I$(0)$, or mutually <i>cointegrated</i> variables. On the other hand, Assumption 2 restricts the errors to zero mean Gaussian processes with a covariance matrix $\pmb{\Omega}$ that allows variables in $\pmb{z}_t$ to be arbitrarily correlated. Under these assumptions, the VAR is in <i>reduced</i> form. This means that not only are all variables treated as endogenous, but any contemporaneous effects are exhibited through <i>contemporaneous correlations</i> in $\pmb{\Omega}$. While useful in its own right, a far more revelatory representation exists in the form of a vector error correction model (VECM).<br /><br /> Relying on the the Beveridge-Nelson (BN) decomposition and some clever rearrangement, it is readily shown that the VECM representation of the VAR in (\ref{eq.ardl.11}) is: \begin{align} \Delta\pmb{z}_t &amp;= \left(\pmb{\Phi}(1)\pmb{\mu} + \left(\sum_{i=1}^{p}i\pmb{\Phi}_i\right)\pmb{\gamma}\right) + \pmb{\Phi}(1)\pmb{\gamma}t - \pmb{\Phi}(1)\pmb{z}_{t-1} + \widetilde{\pmb{\Phi}}^{\star}(L)\Delta\pmb{z}_t + \pmb{\epsilon}_t \notag \\ &amp;= \pmb{a}_0 + \pmb{a}_1t - \pmb{\Phi}(1)\pmb{z}_{t-1} + \widetilde{\pmb{\Phi}}^{\star}(L)\Delta\pmb{z}_t + \pmb{\epsilon}_t \label{eq.ardl.12} \end{align} such that \begin{align} \pmb{a}_0 = \pmb{\Phi}(1)\pmb{\mu} + \left(\sum_{i=1}^{p}i\pmb{\Phi}_i\right)\pmb{\gamma} \quad \text{and} \quad \pmb{a}_1 = \pmb{\Phi}(1)\pmb{\gamma} \label{eq.ardl.13} \end{align} In fact, several important remarks emerge from this construction. </span><br /><ul><span style="font-family: &quot;verdana&quot; , sans-serif;"><li> <b>The Cointegrating Matrix is $\pmb{\Phi}(1)$</b>: If the original VAR variables in (\ref{eq.ardl.11}), namely $\pmb{z}_t$, are I$(1)$, all variables in the VECM are I$(0)$, except possibly for $\pmb{z}_{t-1}$. Since orders of integration must balance, $\pmb{\Phi}(1)\pmb{z}_{t-1}$ must be I$(0)$. Since a set of I$(1)$ variables is said to be cointegrated if there exists a linear combination of said variables which is I$(0)$, it is clear that $\pmb{\Phi}(1)\pmb{z}_t$ is the matrix of cointegrating relationships and $\pmb{\Phi}(1)$ is the cointegrating matrix. In Economics, the concept is often referred to as a <i>long-run relationship</i>, motivating the example that while prices -- which are frequently I$(1)$ variables -- can drift apart in the short-run, economic forces will eventually force them to equilibrium.</li><br /><li> <b>No Cointegration when $\pmb{\Phi}(1) = \pmb{0}$. Every variable in $\pmb{z}_t$ is I$(1)$</b>: Recall that the rank of a matrix is the number of its linearly independent columns (or rows). The concept is frequently used in ordinary least squares (OLS) regression, and is typically exemplified using the dummy variable trap. In this regard, since $\pmb{\Phi}(1)$ is a $(k+1)$-square matrix, assume $\DeclareMathOperator{\rank}{\textbf{rk}}\rank\left(\pmb{\Phi}(1)\right) = r_z$, where $0 \leq r_z \leq (k+1)$, and $\rank(\cdot)$ denotes the rank operator. In other words, among the $(k+1)$ columns in $\pmb{\Phi}(1)$, only $r_z$ are linearly independent, and the ones which are not, are linear combinations of those $r_z$. Moreover, $r_z = 0$ if and only if $\pmb{\Phi}(1) = \pmb{0}_{(1+k)^2}$, where $\pmb{0}_{(1+k)^2}$ denotes the $(1+k)$-square matrix of zeros. When this is the case, the VECM reduces to: $$\Delta\pmb{z}_t = \pmb{a}_0 + \pmb{a}_1t + \widetilde{\pmb{\Phi}}^{\star}(L)\Delta\pmb{z}_t + \pmb{\epsilon}_t$$ Since all variables on the right-hand side (RHS) are I$(0)$, it follows that $\Delta\pmb{z}_t \sim \text{I}(0)$, and therefore $\pmb{z}_t \sim \text{I}(1)$. In other words, when $r_z = 0$, every variable in $\pmb{z}_t$ is I$(1)$, and since $\pmb{\Phi}(1) = \pmb{0}_{(1+k)^2}$, there are <b>no</b> cointegrating relationships.</li><br /><li> <b>No Cointegration when $\pmb{\Phi}(1)$ has full rank. Every variable in $\pmb{z}_t$ is I$(0)$</b>: When $r_z = (k+1)$, $\pmb{\Phi}(1)$ has full column rank (i.e. all columns (rows) are linearly independent). In this particular case, $\DeclareMathOperator{\spann}{\textbf{sp}} \pmb{\Phi}(1)\pmb{z}_{t-1} = \spann{\left(\pmb{z}_{t-1}\right)}$, where $\spann(\cdot)$ denotes the span -- the space of <i>all</i> unique linear combinations of $\pmb{z}_t$. This implies $\Delta \pmb{z}_t$ can be uniquely written as a linear combination of all variables in $\pmb{z}_t$, namely $\pmb{\Phi}(1)\pmb{z}_{t-1}$, plus the remaining deterministic and stationary ones. Since $\Delta \pmb{z}_t \sim \text{I}(0)$, this is only sensible when every variable in $\pmb{z}_t \sim \text{I}(0)$, and cointegration is not possible.</li><br /><li> <b>VECM Estimates Speed of Convergence to Equilibrium</b>: A classical result in linear algebra is that for any $m\times m$ matrix $\pmb{M}$ with rank $r$, there exist $m \times r$ matrices $\pmb{A}$ and $\pmb{B}$ such that $\pmb{M} = \pmb{AB}^\top$, where $\pmb{B}$ consists of the $r$ linearly independent columns of $\pmb{M}$. Thus, we can always write $\pmb{\Phi}(1) = \pmb{AB}^\top$, where $m=(1+k)$. More importantly, it implies that $\pmb{A}$ measures the rate of convergence to equilibrium. To see this, recall that if $\pmb{z}_t$ is cointegrated, then $\pmb{\Phi}(1)\pmb{z}_{t-1} \sim \text{I}(0)$. We can therefore factorize the cointegrated relationships as $\pmb{\Phi}(1)\pmb{z}_{t-1} = \pmb{A}\pmb{B}^\top \pmb{z}_{t-1} = \pmb{A}\pmb{\zeta}_{t-1}$ where $\pmb{\zeta}_{t-1}$ is a mean zero I$(0)$ process. This is because the cointegrating relationships are now captured by $\pmb{B}^\top \pmb{z}_{t-1}$. Observe further that when the system is in actual equilibrium, $\pmb{B}^\top \pmb{z}_{t-1} = \pmb{0}_{1+k}$, where $\pmb{0}_{1+k}$ is $(1+k)$-vector of zeros. This is because equilibrium requires not only stability, which follows from the stationarity of $\pmb{B}^\top \pmb{z}_{t-1}$, but also constancy, which manifests only when accumulated short-run dynamics $\widetilde{\pmb{\Phi}}^{\star}(L)\Delta\pmb{z}_t$, and the shocks to $\pmb{B}^\top \pmb{z}_{t-1}$, namely, $\pmb{\zeta}_{t-1}$, are zero as well. Accordingly, if the system was in equilibrium in the previous period, any <b>current</b> deviations from this state, namely $\Delta \pmb{z}_t$, must arise from systematic shocks $\pmb{\epsilon}_t$, where we assume $\pmb{a}_0 = \pmb{a}_1 = \pmb{0}_{1+k}$ for simplicity. Alternatively, when the system is in disequilibrium, $\pmb{B}^\top \pmb{z}_{t-1} = \pmb{\zeta}_{t-1} \neq \pmb{0}_{1+k}$. Thus, when $\pmb{B}^\top \pmb{z}_{t-1} &lt; \pmb{0}_{1+k}$ $(\pmb{B}^\top \pmb{z}_{t-1} &gt; \pmb{0}_{1+k})$, the impact on $\Delta\pmb{z}_t$ is of magnitude $\pmb{A}$ and positive (negative), since $\pmb{\Phi}(1)$ enters the VECM with a negative sign. In other words, $\Delta\pmb{z}_t$ adjusts toward equilibrium in the <b>opposite</b> direction to disequilibrium by a proportion equal to $\pmb{A}$.</li><br /><li> <b>Cointegrating Relationships Include Constants and Trends</b>: We have outlined this in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> of this series. The restrictions in (\ref{eq.ardl.13}) indicate that $\pmb{a}_0$ and $\pmb{a}_1$ are linear functions of $\pmb{\Phi}(1)$. As such, they span the $r_z$ linearly independent columns of the cointegrating matrix $\pmb{\Phi}(1)$, and by extension, the cointegrating equation. This distinguishes the 5 data generating processes (DGPs) considered in Pesaran, Shin, and Smith (2001) and outlined in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> of this series.</li><br /> <ul><li> <b>Case I:</b> $\pmb{\mu} = \pmb{\gamma} = \pmb{0}$ which implies $\pmb{a}_0 = \pmb{a}_1 = 0$. Accordingly, the VECM (\ref{eq.ardl.12}) reduces to: $$\Delta \pmb{z}_t = -\pmb{\Phi}(1)\pmb{z}_{t-1} + \widetilde{\pmb{\Phi}}^{\star}(L)\Delta\pmb{z}_t + \pmb{\epsilon}_t$$</li><li> <b>Case II:</b> $\pmb{\mu} \neq \pmb{0}$, $\pmb{\gamma} = \pmb{0}$, and the restriction in (\ref{eq.ardl.13}) is imposed. This implies that $\pmb{a}_0 = \pmb{\Phi}(1)\pmb{\mu}$ and $\pmb{a}_1 = 0$. Accordingly, the VECM is just: $$\Delta \pmb{z}_t = -\pmb{\Phi}(1)\left(\pmb{z}_{t-1} - \pmb{\mu}\right) + \widetilde{\pmb{\Phi}}^{\star}(L)\Delta\pmb{z}_t + \pmb{\epsilon}_t$$</li><li> <b>Case III:</b> $\pmb{\mu} \neq \pmb{0}$, $\pmb{\gamma} = \pmb{0}$, and the restrictions in (\ref{eq.ardl.13}) <b>not</b> imposed. This implies that $\pmb{a}_0 \neq 0$, $\pmb{a}_1 = 0$, while the VECM becomes: $$\Delta \pmb{z}_t = \pmb{a}_0 -\pmb{\Phi}(1)\pmb{z}_{t-1} + \widetilde{\pmb{\Phi}}^{\star}(L)\Delta\pmb{z}_t + \pmb{\epsilon}_t$$</li><li> <b>Case IV:</b> $\pmb{\mu},\pmb{\gamma} \neq \pmb{0}$, and the restrictions in (\ref{eq.ardl.13}) are imposed only on $\pmb{a}_1$. This implies that $\pmb{a}_0 \neq 0$ and $\pmb{a}_1 = \pmb{\Phi}(1)\pmb{\gamma}$. The VECM is now: $$\Delta \pmb{z}_t = \pmb{a}_0 -\pmb{\Phi}(1)\left(\pmb{z}_{t-1} - \pmb{\gamma}t\right) + \widetilde{\pmb{\Phi}}^{\star}(L)\Delta\pmb{z}_t + \pmb{\epsilon}_t$$</li><li> <b>Case V:</b> $\pmb{\mu},\pmb{\gamma} \neq \pmb{0}$, and the restrictions in (\ref{eq.ardl.13}) are <b>not</b> imposed. This implies that $\pmb{a}_0,\pmb{a}_1 \neq 0$ and the VECM is represented in (\ref{eq.ardl.12}).</li><br /> </ul></span></ul><span style="font-family: &quot;verdana&quot; , sans-serif;">Remember that the VECM is a reparameterization of a VAR. Accordingly, the VECM quantifies adjustments to equilibrium for all variables simultaneously. Nevertheless, economists, and other practitioners, are generally only interested in one particular variable as it relates to all others. For instance, in the present context, one could be interested in studying adjustments to equilibrium of $y_t$ in response to (conditioning on) the equilibrating paths of the remaining variables $\pmb{x}_t$. Moreover, the objective is only meaningful if, after conditioning on $\pmb{x}_t$, any implications on $y_t$ that would have emerged from the original VAR model, remain unchanged under the conditional one. The concept has a very important name in cointegration theory and is known as <i>exogeneity</i>; see Engle, Hendry, and Richard (1983) for a technical exposition. A natural way of ensuring the concept is to restrict the total number of cointegrating relationships between $y_t$ and $\pmb{x}_t$ to be <b>one, and exactly one</b>, irrespective of any cointegrating paths among the $\pmb{x}_t$ themselves. Should this be the case, $\pmb{x}_t$ are said to be <i>weakly exogenous</i> for any parameters in the equation for $y_t$.<br /><br /> Accordingly, deriving a model for $y_t$ conditional on $\pmb{x}_t$ requires: </span><br /><ul><span style="font-family: &quot;verdana&quot; , sans-serif;"><li> Derive an ECM for $y_t$, explicitly conditioning on <b>all</b> effects originating from $\pmb{x}_t$. Such a model must include not only explicit effects of $\pmb{x}_t$ on $y_t$ stemming from the VAR matrix polynomial $\pmb{\Phi}(L)$, but also <b>any and all</b> contemporaneous relationships between $y_t$ and $\pmb{x}_t$ implicit within the covariance matrix $\pmb{\Omega}$ of the error vector $\pmb{\epsilon}_t$.</li><br /><li> Ensure that $\pmb{x}_t$ are weakly exogenous. </li></span></ul><span style="font-family: &quot;verdana&quot; , sans-serif;">We turn to both these tasks next. <br /><br /> </span><br /><h3><span style="font-family: &quot;verdana&quot; , sans-serif;">Conditional ECM (CECM)</span></h3><span style="font-family: &quot;verdana&quot; , sans-serif;">To derive the conditional model, we first identify the <i>conditional</i> and <i>marginal</i> variables -- namely $y_t$ and $\pmb{x}_t$, respectively. Next, the DGP of $y_t$ is conditioned on the DGPs of the marginal variables $\pmb{x}_t$. Since any explicit relationships between $y_t$ and $\pmb{x}_t$ are clearly accounted for through $\pmb{\Phi}(L)$, any remaining conditioning proceeds on the covariance matrix $\pmb{\Omega}$. Naturally, making these relationships explicit requires a solution where the VAR is driven by a vector of innovations $\pmb{u}_t = \left(u_{yt},\pmb{\epsilon}^\top_{xt}\right)^\top$, where $\pmb{u}_t \sim N(\pmb{0},\pmb{\Sigma})$, and $\pmb{\Sigma}$ is diagonal. In other words, by virtue of Gaussianity, innovations are independent across $y_t$ and $\pmb{x}_t$. Notice that the cointegrating structure of $\pmb{x}_t$ remains unchanged here. Since to each VAR we associate a bijection into its VECM form, all operations can proceed directly on the VECM. In this regard, express (\ref{eq.ardl.12}) as follows: \begin{align} \begin{bmatrix} \Delta y_t\\ \Delta \pmb{x}_t \end{bmatrix} &amp;= \begin{bmatrix} a_{y0}\\ \pmb{a}_{x0} \end{bmatrix} + \begin{bmatrix} a_{y1}\\ \pmb{a}_{x1} \end{bmatrix}t - \begin{bmatrix} \phi_{yy}(1) &amp; \pmb{\phi}_{yx}(1)\\ \pmb{\phi}_{xy}(1) &amp; \pmb{\Phi}_{xx}(1) \end{bmatrix} \begin{bmatrix} y_{t-1}\\ \pmb{x}_{t-1} \end{bmatrix} + \begin{bmatrix} \widetilde{\phi}^\star_{yy}(L) &amp; \widetilde{\pmb{\phi}}^\star_{yx}(L)\\ \widetilde{\pmb{\phi}}^\star_{xy}(L) &amp; \widetilde{\pmb{\Phi}}^\star_{xx}(L) \end{bmatrix} \begin{bmatrix} \Delta y_t\\ \Delta \pmb{x}_t \end{bmatrix} + \begin{bmatrix} \epsilon_{yt}\\ \pmb{\epsilon}_{xt} \end{bmatrix} \label{eq.ardl.14} \end{align} where $\pmb{a}_i = (a_{yi},\pmb{a}^\top_{xi})^\top$ for $i=0,1$, $\widetilde{\pmb{\Phi}}^\star(L) = \left(\widetilde{\pmb{\phi}}^{\star\top}_{y}(L), \widetilde{\pmb{\phi}}^{\star\top}_{x}(L)\right)^\top$, and $\pmb{\Phi}(1)$ assumes the form: \begin{align*} \pmb{\Phi}(1) = \begin{bmatrix} \phi_{yy}(1) &amp; \pmb{\phi}_{yx}(1)\\ \pmb{\phi}_{xy}(1) &amp; \pmb{\Phi}_{xx}(1) \end{bmatrix} \end{align*} Moreover, express the covariance matrix $\pmb{\Omega}$ as follows: \begin{align*} E\left( \begin{bmatrix} \epsilon_{yt}\\ \pmb{\epsilon}_{xt} \end{bmatrix} \begin{bmatrix} \epsilon_{yt} &amp; \pmb{\epsilon}^\top_{xt} \end{bmatrix} \right) = \begin{bmatrix} \omega_{yy} &amp; \pmb{\omega}_{yx}\\ \pmb{\omega}_{xy} &amp; \pmb{\Omega}_{xx} \end{bmatrix} = \pmb{\Omega} \end{align*} It is not difficult to demonstrate that $$\epsilon_{yt} = \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\epsilon}_{xt} + u_{yt}$$ where $u_{yt} \sim N\left(0,\omega_{yy} - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\omega}_{xy}\right)$ is independent of $\pmb{\epsilon}_{xt}$. Moreover, it can then be shown that \begin{align} \Delta \pmb{z}_t &amp;=(\pmb{I}_{k+1} - \pmb{\Psi})\left(\pmb{a}_{0} + \pmb{a}_{1}t - \pmb{\Phi}(1)\pmb{z}_{t-1} + \widetilde{\pmb{\Phi}}^\star(L) \Delta\pmb{z}_{t}\right) + \pmb{\Psi}\Delta\pmb{z}_t + \pmb{u}_{t} \label{eq.ardl.16} \end{align} where $\pmb{\alpha}_i = (\pmb{I}_{k+1} - \pmb{\Psi})\pmb{a}_i$ for $i=1,2$, $\pmb{u}_t = \left(u_{yt}, \pmb{\epsilon}^\top_{xt}\right)^\top$, and $\pmb{\Psi}$ is the matrix: \begin{align*} \pmb{\Psi} = \begin{bmatrix} 0 &amp; \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\\ \pmb{0}_k &amp; \pmb{0}_{k \times k} \end{bmatrix} \end{align*} Making equation (\ref{eq.ardl.16}) explicit, we arrive at: \begin{align} \begin{bmatrix} \Delta y_t\\ \Delta \pmb{x}_t \end{bmatrix} &amp;= \begin{bmatrix} \alpha_{y0} \\ \pmb{\alpha}_{x0} \end{bmatrix} + \begin{bmatrix} \alpha_{y0} \\ \pmb{\alpha}_{x0} \end{bmatrix}t - \begin{bmatrix} \phi_{yy}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\phi}_{xy}(1) &amp; \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)\\ \pmb{\phi}_{xy}(1) &amp; \pmb{\Phi}_{xx}(1) \end{bmatrix} \begin{bmatrix} y_{t-1}\\ \pmb{x}_{t-1} \end{bmatrix}\notag\\ &amp;+ \left((\pmb{I}_{k+1} - \pmb{\Psi})\widetilde{\pmb{\Phi}}^\star(L) + \pmb{\Psi}\right)\Delta\pmb{z}_t + \begin{bmatrix} u_{yt}\\ \pmb{\epsilon}_{xt} \end{bmatrix}\label{eq.ardl.17} \end{align} It now follows that the CECM is given by the equation: \begin{align} \Delta y_t &amp;=\alpha_{y0} + \alpha_{y1}t - \left(\phi_{yy}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\phi}_{xy}(1)\right)y_{t-1} - \left(\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)\right)\pmb{x}_{t-1}\notag\\ &amp;+ \pmb{e}_1^\top\left((\pmb{I}_{k+1} - \pmb{\Psi})\widetilde{\pmb{\Phi}}^\star(L) + \pmb{\Psi}\right)\Delta\pmb{z}_t + u_{yt} \label{eq.ardl.18} \end{align} the cointegrating relationship between $y_t$ and $\pmb{x}_t$, if it exists, is of the form: $$\left(\phi_{yy}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\phi}_{xy}(1)\right)y_{t-1} - \left(\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)\right)\pmb{x}_{t-1}$$ and the marginal ECM is summarized as: $$\Delta \pmb{x}_t = \pmb{\alpha}_{x0} + \pmb{\alpha}_{x1}t - \pmb{\phi}_{xy}(1)y_{t-1} - \pmb{\Phi}_{xx}(1)\pmb{x}_{t-1} + \pmb{e}_2^\top\left((\pmb{I}_{k+1} - \pmb{\Psi})\widetilde{\pmb{\Phi}}^\star(L) + \pmb{\Psi}\right)\Delta\pmb{z}_t + \pmb{\epsilon}_{xt}$$ where $\pmb{e}_1 = \left(1,\pmb{0}_k^\top\right)^\top$ and $\pmb{e}_2 = \left(0,\pmb{I}_k^\top\right)^\top$.<br /><br /> It is also clear that the new cointegrating matrix is specified by $(\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)$. Furthermore, notice that while the system wide shocks are independent across variables, by virtue of $\pmb{\phi}_{xy}(1)y_{t-1}$, there is a feedback channel from $y_{t-1}$ into $\Delta \pmb{x}_t$. Thus, while $u_{yt}$ drives $y_t$ directly, it also indirectly drives $\pmb{x}_t$. In this regard, inference on the CECM in isolation from the marginal ECM will lead to incorrect inference; see Ericsson (1992) for an excellent overview. A natural resolution, therefore, requires $\pmb{\phi}_{xy}(1) = \pmb{0}_k$. This is a critical assumption, and one we impose now.<br /><br /> </span><br /><h4><span style="font-family: &quot;verdana&quot; , sans-serif;">Assumption 3:</span></h4><span style="font-family: &quot;verdana&quot; , sans-serif;"><b>No feedback from $y_t$ into $\pmb{x}_t$</b>: The $k$-vector $\pmb{\phi}_{xy}(1) = \pmb{0}_k$.<br /><br /><br /> Under Assumption 3, if a cointegrating relationship between $y_t$ and $\pmb{x}_t$ exists, it can only enter through the CECM equation. Since $y_t$ is a scalar, the cointegrating relationship, should it exist, is the only one under consideration, while the cointegrating matrix reduces to: \begin{align} (\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1) &amp;= \begin{bmatrix} \phi_{yy}(1) &amp; \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)\\ \pmb{0}_k &amp; \pmb{\Phi}_{xx}(1) \end{bmatrix} \label{eq.ardl.19} \end{align} while the cointegrating relationship between $y_t$ and $\pmb{x}_t$, if it exists, becomes: $$\phi_{yy}(1)y_{t-1} - \left(\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)\right)\pmb{x}_{t-1}$$ </span><br /><h4><span style="font-family: &quot;verdana&quot; , sans-serif;">Relationship to ARDL</span></h4><span style="font-family: &quot;verdana&quot; , sans-serif;">While the CECM in (\ref{eq.ardl.18}) derives from a VAR structure, the observant reader will recognize that it is in effect an ARDL model. In fact, as argued in Boswijk (2004), CECMs are special cases of their <i>structural</i> ECM counterparts, as such, an ARDL model can be thought of as a special case of a structural ECM. Thus, when one speaks of ARDL models in the context of cointegration, what is actually being referred to is the CECM. The relationship is made more stark by referring back to the VAR in (\ref{eq.ardl.11}). In this regard, let the lag polynomial matrix $\pmb{\eta}(L)$ satisfy $\pmb{\eta}(L)\pmb{\Phi}(L) = \pmb{\Phi}(L)\pmb{\eta}(L) = (1-L)\pmb{I}_{k+1}$, and consider the following derivations: \begin{align*} \Delta(\pmb{z}_t - \pmb{\mu} - \pmb{\gamma}t) = \pmb{\eta}(L)\pmb{\Phi}(L)(\pmb{z}_t - \pmb{\mu} - \pmb{\gamma}t) &amp;=\pmb{\eta}(L)\pmb{\epsilon}_t\\ &amp;=\pmb{\eta}(1)\pmb{\epsilon}_t + \widetilde{\pmb{\eta}}(L)\Delta\pmb{\epsilon}_t \end{align*} where the second line above follows from the BN decomposition of $\pmb{\eta}(L)$. Next, assuming without loss of generality that $\pmb{z}_0 = \pmb{\epsilon}_0 = \pmb{0}_k$, we can sum both sides of the equation above to derive: $$(\pmb{z}_t - \pmb{\mu} - \pmb{\gamma}t) = \pmb{\eta}(1)\sum_{i=0}^{t}\epsilon_i + \widetilde{\pmb{\eta}}(L)\pmb{\epsilon}_t$$ where the term $\sum_{i=0}^{t}\epsilon_i$ asymptotically approaches the Brownian motion distribution after appropriate scaling. On the other hand, recall that the CECM cointegrating matrix can be expressed as $(\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)$. Thus, multiplying the expression above with this cointegrating matrix, we derive: \begin{align*} (\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)(\pmb{z}_t - \pmb{\mu} - \pmb{\gamma}t) &amp;= (\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)\pmb{\eta}(1)\sum_{i=0}^{t}\epsilon_i + (\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)\widetilde{\pmb{\eta}}(L)\pmb{\epsilon}_t\\ &amp;= (\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)\widetilde{\pmb{\eta}}(L)\pmb{\epsilon}_t \end{align*} where we have used the fact that $(\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)\pmb{\eta}(1) = (\pmb{I}_{k+1} - \pmb{\Psi})(1-1)\pmb{I}_{k+1} = 0$. Assumptions 1 through 3 now guarantee that, if a cointegrating relationship exists, it must be of the form $(\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)(\pmb{z}_t - \pmb{\mu} - \pmb{\gamma}t)$. In fact, a slightly more expressive relation emerges by rewriting the CECM as: \begin{align*} \Delta y_t &amp;= -\phi_{yy}(1)\left(y_{t-1} - \frac{\alpha_{y0}}{\phi_{yy}(1)} - \frac{\alpha_{y1}}{\phi_{yy}(1)}t + \left(\frac{\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)}{\phi_{yy}(1)}\right)\pmb{x}_{t-1}\right)\\ &amp;+ \pmb{e}_1^\top\left((\pmb{I}_{k+1} - \pmb{\Psi})\widetilde{\pmb{\Phi}}^\star(L) + \pmb{\Psi}\right)\Delta\pmb{z}_t + u_{yt} \end{align*} Since the long-run equation is known to be stationary, it now readily follows that the equilibrating (cointegrating) relationship between $y_t$ and $\pmb{x}_t$ satisfies: \begin{align} y_{t} = \frac{\alpha_{y0}}{\phi_{yy}(1)} + \frac{\alpha_{y1}}{\phi_{yy}(1)}t - \left(\frac{\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)}{\phi_{yy}(1)}\right)\pmb{x}_{t} + v_t\label{eq.ardl.20} \end{align} However, observe that the expression $(\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)(\pmb{z}_t - \pmb{\mu} - \pmb{\gamma}t)$ is precisely the RHS of (\ref{eq.ardl.20}), whereas $(\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)\widetilde{\pmb{\eta}}(L)\pmb{\epsilon}_t = v_t$. Moreover, observe that equation (\ref{eq.ardl.20}) is precisely the long-run equation one derives from the ARDL models in Pesaran and Shin (1998). More importantly, the equation is easily estimated by running OLS on the CECM (\ref{eq.ardl.18}), and deriving the long-run equation post estimation. We've outline the procedure in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> of this series. <br /><br /> </span><br /><h3><span style="font-family: &quot;verdana&quot; , sans-serif;">Inference</span></h3><span style="font-family: &quot;verdana&quot; , sans-serif;">We also pause here to impose a fourth assumption which governs the cointegrating properties of the marginal vectors $\pmb{x}_t$, irrespective of a potential cointegrating relationship with $y_t$ in the CECM. In particular: <br /><br /> </span><br /><h4><span style="font-family: &quot;verdana&quot; , sans-serif;">Assumption 4:</span></h4><span style="font-family: &quot;verdana&quot; , sans-serif;"><b>Conditional variables are mutually cointegrated</b>: The matrix $\pmb{\Phi}_{xx}(1)$ has rank $0\leq r_{x} \leq k$. <br /><br /> The importance of Assumption 4 lies in the flexibility of allowing $\pmb{x}_t$ to be I$(0)$ when $r_x = k$, I$(1)$ when $r_x = 0$, or mutually cointegrated whenever $0 &lt; r_x &lt; k$. Again, recall that the assumption is made without regard as to whether $y_t$ and $\pmb{x}_t$ are themselves cointegrated. Accordingly, we must allow for the possibility of the system cointegrating matrix $(\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)$ to have rank $r_x$ at the very minimum. To ensure this, we note the following result from Abadir and Magnus (2005): \begin{align*} \rank\left((\pmb{I}_{k+1} - \pmb{\Psi})\pmb{\Phi}(1)\right) &amp;= \rank\left( \begin{bmatrix} \phi_{yy}(1) &amp; \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)\\ \pmb{0}_k &amp; \pmb{\Phi}_{xx}(1) \end{bmatrix} \right)\\ &amp;= \begin{cases} r_x \quad &amp;\text{if} \quad \phi_{yy}(1) = 0 \quad \text{and} \quad \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) = \pmb{0}_k^\top\\ 1 + r_x \quad &amp;\text{otherwise} \end{cases} \end{align*} In other words: <br /><br /> <i>While $\pmb{x}_t$ may or may not be cointegrated among itself, there is <b>no</b> cointegrating relationship between $y_t$ and $\pmb{x}_t$ if and only if $\phi_{yy}(1) = 0$ and $\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) = \pmb{0}_k^\top$.</i><br /><br /> However, if this is indeed the case, the CECM reduces to: \begin{align*} \Delta y_t &amp;= \alpha_{y0} + \alpha_{y1}t + \pmb{e}_1^\top\left((\pmb{I}_{k+1} - \pmb{\Psi})\widetilde{\pmb{\Phi}}^\star(L) + \pmb{\Psi}\right)\Delta\pmb{z}_t + u_{yt} \end{align*} Since $\Delta y_t$ is evidently a stationary process, and in the above formulation a function of stationary processes, it stands to reason that $y_t$ itself must be I$(1)$ -- in other words, while $y_t$ and $\pmb{x}_t$ are predisposed to cointegration, no cointegrating relationship exists, regardless of the cointegrating rank $r_x$ among $\pmb{x}_t$.<br /><br /> Thus, the null hypothesis that <b>no</b> cointegrating relationship between $y_t$ and $\pmb{x}_t$ exists, is: $$H_{0,F}: \quad \phi_{yy}(1) = 0 \quad \text{and} \quad \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) = \pmb{0}_k^\top$$ </span><br /><h4><span style="font-family: &quot;verdana&quot; , sans-serif;">Analysis of the Null Hypotheses</span></h4><span style="font-family: &quot;verdana&quot; , sans-serif;">The test for $H_{0,F}$ proceeds by estimating the CECM coefficients using OLS and computing the usual $F$-statistic, $\tau_F$, associated with $H_{0,F}$ for the five cases governed by the deterministic assumptions in (\ref{eq.ardl.13}). Again, we've discussed the specifics in <a href="http://blog.eviews.com/2017/04/autoregressive-distributed-lag-ardl.html" target="_blank">Part 1</a> of this series. Next, $\tau_F$, compared to two sets of critical values: the lower bound $\xi_{L,F}$ associated with the case $\pmb{x}_t \sim \text{I}(0)$, or $r_x = k$, and the upper bound $\xi_{U,F}$, associated with the case $\pmb{x}_t \sim \text{I}(1)$, or $r_x = 0$, where $\xi_{L,F} &lt; \xi_{U,F}$; hence the name, <i>bounds test</i>. Moreover, from Pesaran, Shin, and Smith (2001), critical values for $H_{0,F}$ derive from non-standard limiting distributions. Accordingly, it bears reminding that such tests reject $H_{0,F}$ whenever $\tau_F$ is <b>greater</b> than some critical value. In this regard we have three outcomes: </span><br /><ul><span style="font-family: &quot;verdana&quot; , sans-serif;"><li> $\tau_F &lt; \xi_{L,F} &lt; \xi_{U,F}$: Here we <b>fail</b> to reject $H_{0,F}$ when $\pmb{x}_t$ is either I$(0)$ or I$(1)$. We are therefore assured that <b>no</b> cointegrating relationship between $y_t$ and $\pmb{x}_t$ exists.</li><br /><li> $\xi_{L,F} &lt; \tau_F &lt; \xi_{U,F}$: Here, $\xi_{L,F} &lt; \tau_F$. Accordingly, we <b>reject</b> $H_{0,F}$ when $\pmb{x}_t \sim \text{I}(0)$. Nevertheless, since $\tau_F &lt; \xi_{U,F}$, we <b>fail</b> to reject $H_{0,F}$ when $\pmb{x}_t \sim \text{I}(1)$. This indicates that cointegrating relationships between $y_t$ and $\pmb{x}_t$ may or may not exist for cases where $0 &lt; r_x &lt; k$. Accordingly, we cannot make any specific conclusions unless we know the rank of the system-wide cointegrating matrix (\ref{eq.ardl.19}).</li><br /><li> $\xi_{L,F} &lt; \xi_{U,F} &lt; \tau_F$: Here we <b>reject</b> $H_{0,F}$ when $\pmb{x}_t$ is either I$(0)$ or I$(1)$. Since $r_x = 0$ in this case, we know $\pmb{\Phi}_{xx}(1) = 0$. Moreover, since the maximal rank of the cointegrating matrix (\ref{eq.ardl.19}) is $r_z = 1 + r_x$, from the Abadir and Magnus (2005) result above, the remaining unity rank can arise from one of three possibilities: <ul><li> $\phi_{yy} = 0$ and $\pmb{\phi}_{yx}(1) \neq \pmb{0}_k^\top$ in which case the equilibrating relationship between $y_t$ and $\pmb{x}_t$ is entirely nonsensical. In fact, looking at (\ref{eq.ardl.20}), it is undefined.</li><br /><li> $\phi_{yy} \neq 0$ and $\pmb{\phi}_{yx}(1) = \pmb{0}_k^\top$, in which case the equilibrating relationship is defined but degenerate.</li><br /><li> $\phi_{yy} \neq 0$ and $\pmb{\phi}_{yx}(1) \neq \pmb{0}_k^\top$ in which case the equilibrating relationship is well defined.</li><br /> </ul>This suggests an additional test for $\phi_{yy} = 0$ to exclude possibility (a) above. We discuss this in greater detail in the analysis of the alternative hypothesis below.</li><br /></span></ul><span style="font-family: &quot;verdana&quot; , sans-serif;"></span><br /><h4><span style="font-family: &quot;verdana&quot; , sans-serif;">Analysis of the Alternative Hypotheses</span></h4><span style="font-family: &quot;verdana&quot; , sans-serif;">Given the discussion above, if an equilibrating relationship between $y_t$ and $\pmb{x}_t$ exists, it must reside in $H_{A,F}$, where: $$H_{A,F}: \quad \phi_{yy}(1) \neq 0 \quad \text{or} \quad \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) \neq \pmb{0}_k^\top \quad \text{or both.}$$ In fact, $H_{A,F}$ consists of three alternative specifications, as we will show below, and only one results in a <i>non-degenerate</i> relationship between $y_t$ and $\pmb{x}_t$. In this regard, a non-degenerate relationship must guarantee the existence and validity of the equilibrating equation in (\ref{eq.ardl.20}). In other words, it must ensure $\phi_{yy}(1) \neq 0$, otherwise the relationship is undefined, and $\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) \neq 0$, otherwise the relationship between $y_t$ and $\pmb{x}_t$ in the CECM is through $\Delta\pmb{x}_t$, and hence degenerate. We analyze the implication of these conclusions below. </span><br /><ul><span style="font-family: &quot;verdana&quot; , sans-serif;"><li> $H_{A_1,F}: \quad \phi_{yy}(1) = 0$ and $\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) \neq \pmb{0}_k^\top$.<br /><br /> Here, the result from Abadir and Magnus (2005) assures us that the cointegrating matrix (\ref{eq.ardl.19}) has rank $r_z = 1 + r_x$, and the CECM reduces to: $$\Delta y_t = \alpha_{y0} + \alpha_{y1}t - \left(\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)\right)\pmb{x}_{t-1} + \pmb{e}_1^\top\left((\pmb{I}_{k+1} - \pmb{\Psi})\widetilde{\pmb{\Phi}}^\star(L) + \pmb{\Psi}\right)\Delta\pmb{z}_t + u_{yt}$$ The cointegrating relationship (\ref{eq.ardl.20}) is here undefined since $\phi_{yy} = 0$. Moreover, since $\pmb{\Phi}_{xx}(1)$ is the only cointegrating matrix for $\pmb{x}_t$, it holds that $\pmb{\Phi}_{xx}(1)\pmb{x}_{t-1} \sim \text{I}(0)$, and therefore all RHS variables are I$(0)$ except possibly $\pmb{\phi}_{yx}(1)\pmb{x}_{t-1}$. However, since $\phi_{yy} = 0$, $\pmb{\phi}_{yx}(1)$ is <b>not</b> a cointegrating matrix for $\pmb{x}_t$ and therefore $\pmb{\phi}_{yx}(1)\pmb{x}_{t-1}$ may be I$(0)$ or I$(1)$. Either way, $y_t \sim \text{I}(1)$ regardless of the cointegrating rank $r_x$.</li><br /><li> $H_{A_2,F}: \quad \phi_{yy}(1) \neq 0$ and $\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) = \pmb{0}_k^\top$<br /><br /> In this case, the CECM assumes the form: $$\Delta y_t = \alpha_{y0} + \alpha_{y1}t - \phi_{yy}(1)y_{t-1} + \left(\widetilde{\phi}^\star_{yy}(L) -\pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\widetilde{\pmb{\phi}}^\star_{xy}(L)\right)\Delta y_t +\pmb{e}_1^\top\left((\pmb{I}_{k+1} - \pmb{\Psi})\widetilde{\pmb{\Phi}}^\star(L) + \pmb{\Psi}\right)\Delta\pmb{z}_t + u_{yt}$$ In fact, the equation is a special case of the Augmented Dickey-Fuller (ADF) regression. By Assumption 1, when $\pmb{\Phi}(1) = 0$, and therefore $\phi_{yy}(1) = 0$, the vector $\pmb{z}_t$, and therefore $y_t$, has a unit root. Under the alternative, however, $\phi_{yy}(1) \neq 0$ and we know that either $y_t \sim \text{I}(0)$ whenever $\alpha_{y1} = 0$, or $y_t$ is trend stationary should $\alpha_{y1} \neq 0$. Again, this holds regardless of the cointegrating rank $r_x$. Moreover, the result from Abadir and Magnus (2005) ensures that the cointegrating matrix (\ref{eq.ardl.19}) has rank $r_z = 1 + r_x$. It is important to note here that while the idea of a cointegrating relationship between $y_t$ and $\pmb{x}_t$ is not possible, there exists a relationship between $y_t$ and $\pmb{x}_t$ originating from the short-run dynamics manifesting through $\Delta \pmb{x}_t$. Since this is <b>not</b> an equilibrating relationship originating from $\pmb{x}_{t-1}$, the relationship is degenerate in equilibrium.</li><br /><li> $H_{A_3,F}: \quad \phi_{yy}(1) \neq 0$ and $\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) \neq \pmb{0}_k^\top$<br /><br /> Here, the result from Abadir and Magnus (2005) guarantees that $r_z = 1 + r_x$. Moreover, Abadir and Magnus (2005) ensures us there exist $(k+1\times r)$-matrices $\pmb{A}$ and $\pmb{B}$ such that one can write $\pmb{\phi}(1)$ in rank factorization as follows: \begin{align*} \begin{bmatrix} \phi_{yy}(1) &amp; \pmb{\phi}_{yx}(1)\\ \pmb{0}_k &amp; \pmb{\Phi}_{xx}(1) \end{bmatrix} &amp;= \begin{bmatrix} A_{yy}\\ \pmb{0}_k \end{bmatrix} \begin{bmatrix} B_{yy} &amp; \pmb{B}^\top_{yx} \end{bmatrix} + \begin{bmatrix} \pmb{A}_{yx}\\ \pmb{A}_{xx} \end{bmatrix} \begin{bmatrix} \pmb{0}_k &amp; \pmb{B}^\top_{xx} \end{bmatrix}\\ &amp;= \begin{bmatrix} A_{yy}B_{yy} &amp; A_{yy}\pmb{B}^\top_{yx} + \pmb{A}_{yx}\pmb{B}^\top_{xx}\\ \pmb{0}_k &amp; \pmb{A}_{xx}\pmb{B}^\top_{xx} \end{bmatrix} \end{align*} Thus, $\pmb{\phi}_{xy}(1) = A_{yy}\pmb{B}^\top_{yx} + \pmb{A}_{yx}\pmb{B}^\top_{xx}$, where $\pmb{B}_{xx}^\top$ comprises the cointegrating matrix underlying $\pmb{\Phi}_{xx}(1) = \pmb{A}_{xx}\pmb{B}^\top_{xx}$ of $\pmb{x}_t$, irrespective of $y_t$. Accordingly, any equilibrating link between $y_t$ and $\pmb{x}_t$ is due to the cointegrating matrix $\pmb{B}^\top_{yx}$. Accordingly, we have two possibilities.</li><br /> <ul><li> $\rank(\pmb{B}^\top_{yx},\pmb{B}^\top_{xx}) = r_x$. In this case, the cointegrating vector $\pmb{B}^\top_{yx}$ is subsumed by $\pmb{B}^\top_{xx}$ since $\rank(\pmb{\Phi}_{xx}(1)) = \rank(\pmb{B}_{xx}) = r_x$. Thus, the equilibrating relationship between $y_t$ and $\pmb{x}_t$ is <b>not</b> due to traditional cointegration, but is valid nonetheless. Here, $y_t \sim \text{I}(0)$ since $\phi_{yy}(1) \neq 0$.</li><br /><li> $\rank(\pmb{B}^\top_{yx},\pmb{B}^\top_{xx}) = 1 + r_x$. In this case, the cointegrating vector $\pmb{B}^\top_{yx}$ is <b>not</b> redundant, and drives the cointegrating link between $y_t$ and $\pmb{x}_t$. The equilibrating relationship is now of the traditional cointegration type, and therefore $y_t \sim \text{I}(1)$.</li><br /> </ul>In either case, it is readily shown that the relationships which emerge are non-degenerate. </span></ul><span style="font-family: &quot;verdana&quot; , sans-serif;">We can summarize the insight above as follows: $$\begin{array}{l|c|l|c} &amp; \text{Specification} &amp; \text{Conclusion} &amp; \text{Integration Order} \\ \hline H_{0,F} &amp; \phi_{yy}(1) = 0 \text{ and } \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) = \pmb{0}_k^\top &amp; \text{No equilibrating relationship.} &amp; y_t \sim I(1)\\ &amp;&amp;&amp;\\ H_{A_1,F} &amp; \phi_{yy}(1) = 0 \text{ and } \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) \neq \pmb{0}_k^\top &amp; \text{Equilibrating relationship} &amp; y_t \sim I(1)\\ &amp; &amp; \text{is nonsensical.} &amp;\\ &amp;&amp;&amp;\\ H_{A_2,F} &amp; \phi_{yy}(1) \neq 0 \text{ and } \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) = \pmb{0}_k^\top &amp; \text{Equilibrating relationship} &amp; y_t \sim I(0) \text{ or TS}\\ &amp; &amp; \text{is degenerate.} &amp;\\ &amp;&amp;&amp;\\ H_{A_3,F} &amp; \phi_{yy}(1) \neq 0 \text{ and } \pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) \neq \pmb{0}_k^\top &amp; \text{Equilibrating relationship} &amp; y_t \sim I(0) \text{ or } I(1)\\ &amp; &amp; \text{is non-degenerate.} &amp; \end{array}$$ An important observation emerges. Notice that if we reject the null hypothesis, it is unclear which of the three alternative hypotheses manifests. Accordingly, rejecting $H_{0,F}$ does <b>not</b> guarantee that a non-degenerate relationship exists, or even a degenerate one! To identify the alternative (at least partially), one requires an additional test for $H_{0,t}: \phi_{yy}(1) = 0$, although in contrast to the test for $H_{0,F}$, testing $H_{0,t}$ is only sensible for cases I, III, and V of the deterministic restrictions in (\ref{eq.ardl.13}). While the usual $t$-statistic, $\tau_t$, will suffice, like $\tau_F$, its distribution is non-standard. In this regard, analogous to the limiting distributions of $\tau_F$, Pesaran, Shin, and Smith (2001) also provide sets of critical values $\xi_{L,t} &lt; \xi_{U,t}$ for $\tau_t$, where $\xi_{L,t}$ and $\xi_{U,t}$ are derived respectively for $\pmb{x}_t \sim \text{I}(1)$ and $\pmb{x}_t \sim \text{I}(0)$. Since $\tau_t$ has a non-standard distribution, a rejection of $H_{0,t}$ requires $\tau_F$ to be <b>greater</b> than the appropriate critical value, or <b>less</b>&nbsp;than the negative of said critical value, since the test has a two sided alternative. Alternatively, one rejects the null hypothesis whenever the absolute value of $\tau_t$ is greater than the absolute value of the appropriate critical value. There are therefore three possibilities to consider: </span><br /><ul><span style="font-family: &quot;verdana&quot; , sans-serif;"><li> $|\tau_t| &lt; |\xi_{L,t}| &lt; |\xi_{U,t}|$: As before, $\pmb{x}_t$ is either I$(0)$ or I$(1)$. Moreover, since $\tau_t &lt; \xi_{L,t}$, we&nbsp;<b>fail</b>&nbsp;to reject $H_{0,t}$. Since we have already rejected $H_{0,F}$, this implies $\phi_{yy}(1) = 0$ and $\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) \neq \pmb{0}_k^\top$. We conclude therefore that we $H_{A,F}$ manifests as $H_{A_1,F}$ and a nonsensical equilibrating relationship between $y_t$ and $\pmb{x}_t$ emerges.</li><br /><li> $|\xi_{L,t}| &lt; |\tau_t| &lt; |\xi_{U,t}|$: Here we <b>reject</b> $H_{0,t}$ when $\pmb{x}_t \sim \text{I}(0)$ but <b>fail</b> to do so for the case where $\pmb{x}_t \sim \text{I}(1)$ and $0 &lt; r_x &lt; k$. Thus, examples may emerge where the $\pmb{x}_t$ are mutually cointegrated for which we may or may not reject $H_{0,t}$. Unless we know the rank of the cointegrating matrix (\ref{eq.ardl.19}), little more can be inferred.</li><br /><li> $|\xi_{L,t}| &lt; |\xi_{U,t}| &lt; |\tau_t|$: In this case, we&nbsp;<b>reject</b>&nbsp;$H_{0,t}$ when $\pmb{x}_t$ is either I$(0)$ or I$(1)$, implying $\phi_{yy}(1) \neq 0$. Accordingly, unless we know that $\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1) = \pmb{0}_{k}^\top$, we must conclude that $H_{A,F}$ manifests either as $H_{A_2,F}$, or $H_{A_3,F}$. In either case, an equilibrating relationship emerges, albeit degenerate in case of $H_{A_2,F}$.</li><br /></span></ul><span style="font-family: &quot;verdana&quot; , sans-serif;">The process is visualized below:<br /><br /> </span><br /><div class="separator" style="clear: both; text-align: center;"><span style="font-family: &quot;verdana&quot; , sans-serif;"><a href="https://2.bp.blogspot.com/-NdmNlHHQc64/WREA6axaarI/AAAAAAAAAT4/2MEE3ArBBa8RrBb_5by7gGuLfvAbIIeDACLcB/s1600/flowchar2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-NdmNlHHQc64/WREA6axaarI/AAAAAAAAAT4/2MEE3ArBBa8RrBb_5by7gGuLfvAbIIeDACLcB/s640/flowchar2.png" width="480" /></a></span></div><span style="font-family: &quot;verdana&quot; , sans-serif;"><br /><br /> <h4>Adjustment to Equilibrium Regression</h4>We close with a discussion on estimating adjustment to equilibrium. Recall that in the VECM (\ref{eq.ardl.12}), $\pmb{\Phi}(1)$ not only governs the cointegrating properties among $\pmb{z}_t$, but $\pmb{\Phi}(1)\pmb{z}_t = \pmb{A}\pmb{B}^\top\pmb{z}_{t-1}$, where $\pmb{A}$ is a measure of adjustment to equilibrium. To do so, one first estimates the CECM (ARDL) (\ref{eq.ardl.18}) using OLS, then proceeds to compute an estimate of the long-run equation (\ref{eq.ardl.20}) post-estimation. Let $EC_t$ denote the non-stochastic part of this equation, a variable that is typically known as the <i>error-correction</i> (EC) term. In other words: $$EC_t = y_t - \frac{\alpha_{y0}}{\phi_{yy}(1)} - \left(\frac{\alpha_{y1}}{\phi_{yy}(1)}\right)t + \left(\frac{\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)}{\phi_{yy}(1)}\right)\pmb{x}_{t}$$ Next, one substitutes $EC_t$ back into the CECM in place of the theoretical long-run equation to derive: \begin{align} \Delta y_t = -\phi_{yy}(1)EC_t +\pmb{e}_1^\top\left((\pmb{I}_{k+1} - \pmb{\Psi})\widetilde{\pmb{\Phi}}^\star(L) + \pmb{\Psi}\right)\Delta\pmb{z}_t + u_{yt} \label{eq.ardl.22} \end{align} Finally, one estimates the equation above using OLS again to derive an estimate of $\phi_{yy}(1)$, which is the parameter governing the <i>speed</i> of adjustment to equilibrium, and is analogous to the matrix $\pmb{A}$ in the original VECM. However, since one is only reparameterizing the CECM, whatever estimate is obtained for $\phi_{yy}(1)$ in the equation above, is in fact identical to the one obtained by estimating the ARDL to derive an estimate of the $EC_t$ in the first place. Thus, if one is only interested in obtaining estimate of the speed of adjustment to equilibrium, the regression above is redundant. Nevertheless, if one wishes to conduct inference on the parameter, such as a significance test, it is important to realize that the distribution involved cannot rely on the standard $t$-statistic distribution and $p$-values. To see this, observe that: $$\Delta EC_t = \Delta y_t - \frac{\alpha_{y1}}{\phi_{yy}(1)} + \left(\frac{\pmb{\phi}_{yx}(1) - \pmb{\omega}_{yx}\pmb{\Omega}^{-1}_{xx}\pmb{\Phi}_{xx}(1)}{\phi_{yy}(1)}\right)\Delta\pmb{x}_{t}$$ Next, substitute $EC_t$ and $\Delta EC_t$ into (\ref{eq.ardl.22}) and note that it can be shown that: $$\Delta EC_t = c_0 -c_1EC_{t-1} + c_2(L)\Delta EC_t + \pmb{c}_3(L)\Delta\pmb{x}_t + u_{yt}$$ where the coefficients $c_0 = \frac{\alpha_{y0}}{\phi_{yy}(1)}$, $c_2(L)$ and $\pmb{c}_3(L)$ are some lag polynomials from the coefficients of the system, and evidently, $c_1 = \phi_{yy}(1)$. Moreover, the equation is clearly a variant of the famous ADF regression for which the OLS estimate of $c_1$ is in fact an estimate of $\phi_{yy}(1)$. Nevertheless, while one easily derives the $t$-statistic for the estimate of $c_1$, since the regression is of the ADF variety, it has a non-standard limiting distribution. Accordingly, testing the null hypothesis $H_0: \phi_{yy}(1) = 0$ requires critical and $p$-values that are in accordance with the appropriate BM distributions.<br /><br /> Please stay tuned for our final blog entry in this series which will focus on implementing ARDL and the Bounds Test in EViews.<br /><br /> <h3>References:</h3><a href="https://www.blogger.com/null" name="abadir-2005"></a> Abadir, K.M. and Magnus, J.R. (2005). Matrix Algebra <em>Cambridge University Press</em>. <br /> <a href="https://www.blogger.com/null" name="boswijk-1994"></a> Boswijk, H. P. (1994). Testing for an unstable root in conditional and structural error correction models <em>Journal of econometrics</em>.63(1):37-60 <br /> <a href="https://www.blogger.com/null" name="casella-2002"></a> Casella, G. and Berger R.L. (2002). Statistical Inference <em>Duxbury Pacific Grove, CA</em> <br /> <a href="https://www.blogger.com/null" name="engle-1983"></a> Engle, R.F., Hendry D.F., and Richard. J. (1983). Exogeneity <em>Econometrica: Journal of the Econometric Society</em>277-304 <br /> <a href="https://www.blogger.com/null" name="ericsson-1992"></a> Ericsson, N.R. (1992). Cointegration, exogeneity, and policy analysis: An overview. <em>Journal of policy modeling</em>13(3)251-280 <br /> <a href="https://www.blogger.com/null" name="pesaran-1998"></a> Pesaran, M.&nbsp;H. and Shin, Y. (1998). An autoregressive distributed-lag modelling approach to cointegration analysis. <em>Econometric Society Monographs</em>, 31:371--413. <br /> <a href="https://www.blogger.com/null" name="pesaran-2001"></a> Pesaran, M.&nbsp;H., Shin, Y., and Smith, R.&nbsp;J. (2001). Bounds testing approaches to the analysis of level relationships. <em>Journal of applied econometrics</em>, 16(3):289--326. <br /> <a href="https://www.blogger.com/null" name="sims-1980"></a> Sims, C.A. (1980). Macroeconomics and reality <em>Econometrica: Journal of the Econometric Society</em>, 1-48 <br /> </span>IHSEViewshttp://www.blogger.com/profile/04703437003033046408noreply@blogger.com25