The title of this blog piece is a verbatim excerpt from the Bellego and Pape (2019) paper suggested by Professor David E. Giles in his October reading list. (Editor's note: Professor Giles has recently announced the end of his blog - it is a fantastic resource and will be missed!). The topic is immediately familiar to practitioners who occasionally encounter the difficulty in applied work. In this regard, it is reassuring that the frustration is being addressed and that there is indeed an ongoing quest for the silver bullet.
Table of Contents
Introduction
Consider the following data generating process where the dependent variable may contain zeros: $$ \log(y_i) = \alpha + x_i^\prime \beta + \epsilon_i \quad \text{with} \quad E(\epsilon_i)=0 $$ The most common remedy to the logarithm of zero value problem among practitioners is to add a common (observation independent) positive constant to the problematic observations. In other words, to work with the model: $$ \log(y_i + \Delta) = \alpha + x_i^\prime \beta + \omega_i $$ where $ \Delta $ is the corrective constant.In the aforementioned paper, the authors use Monte Carlo simulations to demonstrate that the bias incurred by this correction is not necessarily negligible for small values of $ \Delta $, and in fact, may be substantial.
Figure 1: Estimation bias as a function of $ \Delta $
- Does not generate computational bias by arbitrary normalization.
- Does not generate correlation between the error term and regressors.
- Does not require the deletion of observation.
- Does not require the estimation of a supplementary parameter.
- Does not require addition of a discretionary constant.
A Novel Approach
Bellego and Pape (2019) suggest that instead of adding a common positive constant $ \Delta $, one ought to add some optimal, observation-dependent positive value $ \Delta_{i} $. The novel strategy results in the following model and is estimated via GMM: $$ \log(y_i + \Delta_{i}) = \alpha + x_i^\prime \beta + \eta_{i} $$ where $ \Delta_i = \exp(x_i^\prime \beta) $ and $ \eta_i = \log(1 + \exp(\alpha + \epsilon_i)) $.Since the details can be referred to in the original paper, here I’d like to replicate the simulation exercise in which the authors illustrate their method and make a comparison with other approaches. (The tables below can be replicated in EViews by running the program file loglinear.prg.)
Figure 2: Output of OLS estimation (with $ \Delta = 1 $)
Figure 3: Output of Pseudo Poissson Maximum Likelihood (PPML) estimation
Figure 4: Output of proposed solution (GMM estimation)
Figure 5: OLS estimation of alpha parameter: $ \log(\exp(\eta_i)-1)=\alpha+\epsilon_i $
Figure 6: OLS estimation
Figure 7: PPML estimation
Figure 8: GMM estimation
This comment has been removed by the author.
ReplyDelete