Monday, December 21, 2020

Using Indicator Saturation to Detect Outliers and Structural Shifts

One of the potential pitfalls when working with time series datasets is that the data may have temporary or permanent changes to its levels. These changes could be single time-period outliers, or a fundamental structural shift.

EViews 12 introduces a new technique to detect and model these outliers and structural changes through indicator saturation. in the recently released EViews 12, we thought we'd give another demonstration.

Indicator Saturation

Identifying changes in data is essential if we are to properly estimate models based upon these data. One way to detect changes would be to include dummy or indicator variables for potential observations where the change occurs in your regression, and then decide whether that included indicator is a valid regressor. Such variables could include:
• Impulse Indicators (IIS): a dummy variable equal to zero everywhere other than a single value of one at period $t$. This indicator can be used to model single observation outliers, and is equivalent to the @isperiod EViews function used at the date corresponding to $t$.
• Step Indicators (SIS): a step function variable equal to zero until $t$ and one thereafter. This indicator can be used to model a shift in the intercept of an equation, and is equivalent to the @after EViews function used at the date corresponding to $t$.
• Trend Indicators (TIS): a trend-break variable that is equal to zero until period $t$ and then a follows a trend afterward. This indicator can be used to model a change in the trend of an equation (or the introduction of a trend term if one didn’t previously exist), and is equivalent to the @trendbr function used at the date corresponding to t.

The problem with the approach of including these variables in a traditional regression setting is that unless you know the specific dates where changes occur, you can quickly run into a situation where you have more variables than observations (since you’ll be adding at least one indicator variable for each observation in your estimation sample!).

Fortunately, recent advancements in variable selection techniques have meant that we can now perform variable selection on models with many more variables than observations, and so can saturate our regression with complex combinations of indicator variables and let the variable selection technique choose which are the most appropriate indicators to use.

AutoSearch/GETS

One of the new technologies introduced in EViews 12 is the AutoSearch/GETS algorithm for variable selection.

AutoSearch/GETS is a method of variable selection that follows the steps suggested by AutoSEARCH algorithm of Escribano and Sucarrat (2011), which in turn builds upon the work in Hoover and Perez (1999), and is similar to the technology behind the Autometrics™ module in PcGive™.

Mechanically the algorithm is similar to a backwards uni-directional stepwise method:
1. The model with all search variables (termed the general unrestricted model, GUM) is estimated, and checked with a set of diagnostic tests.
2. A number of search paths are defined, one for each insignificant search variable in the GUM.
3. For each path, the insignificant variable defined in 2) is removed and then a series of further variable removal steps is taken, each time removing the most insignificant variable, and each time checking whether the current model passes the set of diagnostic tests. If the diagnostic tests fail after the removal of a variable, that variable is placed back into the model and prevented from being removed again along this path. Variable removal finishes once there are no more insignificant variables, or it is impossible to removal a variable without failing the diagnostic tests.
4. Once all paths have been calculated the final models produced by the paths are compared using an information criteria selection. The best model is then selected.

One of the advantages of AutoSearch/GETS is that the set of candidate variables can be split into sets, with search performed on each sets one at a time, then the selected variables from each set can be combined into a final set to be searched. This allows you to test more candidate variables than you have observations without creating singularities (as long as enough candidate variables are rejected), which means it is a perfect algorithm for indicator saturation studies.

An Application with Consumption and Income

To demonstrate this feature, we will estimate a simple personal consumption equation, using log-difference of personal consumption as the dependent variable against a constant and log-differenced disposable income. This estimation is purely for demonstration of the saturation features in EViews 12, and should not be taken as worthy macroeconomic research!

Both data series were downloaded directly from the Federal Reserve of St Louis database, FRED, and contain monthly observations between 2002 and April 2020:

 Figure 1: FRED (Click to expand)

We begin by estimating a simple equation without any indicators included, using the following steps:
1. Quick/Estimate Equation to bring up the equation estimation dialog.
2. Enter our dependent variable DLOG(CONS) followed by a constant and our regressor DLOG(INCOME).
3. Clicking OK.

 Figure 2a: Simple Estimation Dialog (Click to expand) Figure 2b: Simple Estimation Output (Click to expand)

Note that the coefficient on log differenced income is negative and statistically significant. Also note we have an R-squared of 35%.

If we click on the Resids button we can view a graph of the equation residuals.

 Figure 3: Estimation Residuals (Click to expand)

A quick eyeball test suggests that something happened towards the end of 2004, again in the middle of 2008 and then 2013. And obviously there was a huge shift at the start of the Covid-19 crisis in March/April 2020.

Now we’ll estimate a new equation where we will instruct EViews to detect for both impulse (outlier) and step-shift (change in intercept) indicators, with the following steps:
1. Quick/Estimate Equation> to bring up the equation estimation dialog.
2. Enter our dependent variable DLOG(CONS) followed by a constant and our regressor DLOG(INCOME).
3. Switch to the Options Tab and select Auto-detection under Outliers/indicator saturation.
4. Press the Options button and select both Impulse and Step-shift indicators.
5. Change the Terminal condition p-value to 0.01 (which will allow for more indicators entering the equation).
6. Clicking OK twice.

 Figure 4a: Impulse Estimation (Click to expand) Figure 4b: Impulse Estimation Output (Click to expand)

You can see that five indicators have been added to the equation, with three single observation indicators (2018M12, 2020M03, 2020M04), and two level shift indicators (2008M5, 2013M1).

The impact of these variables on the log-differenced income coefficient is dramatic, as is resulting R-squared.

Viewing the residual graph shows that the large outliers have been removed, and the location of detected indicators, as shown by the vertical lines, corresponds to the outliers we eyeballed in the original equation.

 Figure 5: Impulse Residuals (Click to expand)