EViews 12 introduces a new technique to detect and model these outliers and structural changes through indicator saturation. in the recently released EViews 12, we thought we'd give another demonstration.
Table of Contents
Indicator Saturation
Identifying changes in data is essential if we are to properly estimate models based upon these data. One way to detect changes would be to include dummy or indicator variables for potential observations where the change occurs in your regression, and then decide whether that included indicator is a valid regressor. Such variables could include:- Impulse Indicators (IIS): a dummy variable equal to zero everywhere other than a single value of one at period $ t $. This indicator can be used to model single observation outliers, and is equivalent to the @isperiod EViews function used at the date corresponding to $ t $.
- Step Indicators (SIS): a step function variable equal to zero until $ t $ and one thereafter. This indicator can be used to model a shift in the intercept of an equation, and is equivalent to the @after EViews function used at the date corresponding to $ t $.
- Trend Indicators (TIS): a trend-break variable that is equal to zero until period $ t $ and then a follows a trend afterward. This indicator can be used to model a change in the trend of an equation (or the introduction of a trend term if one didn’t previously exist), and is equivalent to the @trendbr function used at the date corresponding to t.
The problem with the approach of including these variables in a traditional regression setting is that unless you know the specific dates where changes occur, you can quickly run into a situation where you have more variables than observations (since you’ll be adding at least one indicator variable for each observation in your estimation sample!).
Fortunately, recent advancements in variable selection techniques have meant that we can now perform variable selection on models with many more variables than observations, and so can saturate our regression with complex combinations of indicator variables and let the variable selection technique choose which are the most appropriate indicators to use.
AutoSearch/GETS
One of the new technologies introduced in EViews 12 is the AutoSearch/GETS algorithm for variable selection.AutoSearch/GETS is a method of variable selection that follows the steps suggested by AutoSEARCH algorithm of Escribano and Sucarrat (2011), which in turn builds upon the work in Hoover and Perez (1999), and is similar to the technology behind the Autometrics™ module in PcGive™.
Mechanically the algorithm is similar to a backwards uni-directional stepwise method:
- The model with all search variables (termed the general unrestricted model, GUM) is estimated, and checked with a set of diagnostic tests.
- A number of search paths are defined, one for each insignificant search variable in the GUM.
- For each path, the insignificant variable defined in 2) is removed and then a series of further variable removal steps is taken, each time removing the most insignificant variable, and each time checking whether the current model passes the set of diagnostic tests. If the diagnostic tests fail after the removal of a variable, that variable is placed back into the model and prevented from being removed again along this path. Variable removal finishes once there are no more insignificant variables, or it is impossible to removal a variable without failing the diagnostic tests.
- Once all paths have been calculated the final models produced by the paths are compared using an information criteria selection. The best model is then selected.
One of the advantages of AutoSearch/GETS is that the set of candidate variables can be split into sets, with search performed on each sets one at a time, then the selected variables from each set can be combined into a final set to be searched. This allows you to test more candidate variables than you have observations without creating singularities (as long as enough candidate variables are rejected), which means it is a perfect algorithm for indicator saturation studies.
An Application with Consumption and Income
To demonstrate this feature, we will estimate a simple personal consumption equation, using log-difference of personal consumption as the dependent variable against a constant and log-differenced disposable income. This estimation is purely for demonstration of the saturation features in EViews 12, and should not be taken as worthy macroeconomic research!Both data series were downloaded directly from the Federal Reserve of St Louis database, FRED, and contain monthly observations between 2002 and April 2020:
|
|
- Quick/Estimate Equation to bring up the equation estimation dialog.
- Enter our dependent variable DLOG(CONS) followed by a constant and our regressor DLOG(INCOME).
- Clicking OK.
|
|
|
|
If we click on the Resids button we can view a graph of the equation residuals.
|
|
Now we’ll estimate a new equation where we will instruct EViews to detect for both impulse (outlier) and step-shift (change in intercept) indicators, with the following steps:
- Quick/Estimate Equation> to bring up the equation estimation dialog.
- Enter our dependent variable DLOG(CONS) followed by a constant and our regressor DLOG(INCOME).
- Switch to the Options Tab and select Auto-detection under Outliers/indicator saturation.
- Press the Options button and select both Impulse and Step-shift indicators.
- Change the Terminal condition p-value to 0.01 (which will allow for more indicators entering the equation).
- Clicking OK twice.
|
|
|
|
The impact of these variables on the log-differenced income coefficient is dramatic, as is resulting R-squared.
Viewing the residual graph shows that the large outliers have been removed, and the location of detected indicators, as shown by the vertical lines, corresponds to the outliers we eyeballed in the original equation.
|
|
Dear Eviews, I have some questions related the Indicator Saturation to identify outlier. How does the indicator saturation work dealing with outlier data? Does the outlier data remove and change with the dummy data or the outlier data keep using but the equation regression change? Thank you.
ReplyDelete