11 Dynamic Panel Data

11.1 When Is It a Problem

Model setup: \[y_{i,t} = \gamma y_{i, t-1} + X_{i,t}' \beta + C_i + \epsilon_{i,t}\]

Pooled regression is biased and inconsistent. The problem is:

\[Cov( y_{i,t-1}, (C_i + \epsilon_{i,t})) \approx \frac{\sigma_c^2}{(1-\gamma)}\]

Random effects model is biased and inconsistent, for the same reason that $y_{i,t-1}$ is necessarily correlated with $C_i$, which is part of the composite error term in the random effect model.

Fixed effect model setup:

\[ y_{i,t} - \bar y_i = (X_{i,t} - \bar X_i)' \beta + \gamma (y_{i, t-1} - \overline{y_{i,-1}}) + (\epsilon_{i,t} - \bar \epsilon_i)\]

where $\overline{y_{i,-1}}$ is the unit mean of the lagged dependent variable. This is not exactly $\bar y_i$ (the mean of current $y_{i,t}$): the two differ at the first and last usable periods, though they coincide asymptotically and under the usual balanced-sample convention.

Nickell (1981) shows that (see also Anderson and Hsiao 1981)

\[Cov( (y_{i,t-1}- \overline{y_{i,-1}}), (\epsilon_{i,t} - \bar \epsilon_i )) \approx -\frac{\sigma_\epsilon^2}{T (1-\gamma)^2} [\frac{(T-1)-T \gamma + \gamma^T}{T}]\]

which indicates that the demeaned lagged $y$ is correlated with the demeaned error term. The correlation may be large if $T$ is small, but when $T$ is big, then the correlation goes down to near zero. When $T$ is big, we don’t have a problem with fixed effect model.

11.2 How Big Is the Bias

When $T$ is small, which is typically the case in many microeconomic settings, then we say the fixed effect model with a lagged dependent variable is inconsistent. But how bad is it?

The limit of $\hat \gamma - \gamma$ is approximately $-\frac{(1+\gamma)}{T-1}$. When $T=10$ and $\gamma=.5$, the bias is about $-0.167$ which is $1/3$ of the true value.

But a simulation study (Judson and Owen 1999) shows that the bias can be as big as $20\%$ of the true coefficient even when $T=30$.

11.3 Anderson and Hsiao estimator

Anderson and Hsiao (1981) suggested looking at the first difference estimator:

\[ y_{i,t} - y_{i,t-1} = (X_{i,t} - X_{i,t-1})' \beta + \gamma (y_{i,t-1} - y_{i,t-2}) + (\epsilon_{i,t} - \epsilon_{i,t-1}) \]

\[ \Delta y_{i,t}= \Delta X_{i,t} \beta + \gamma \Delta y_{i,t-1} + \Delta \epsilon_{i,t} \]

This does not solve the endogeneity problem, since $\Delta y_{i,t-1}$ is still correlated with the error term. AH’s idea is to instrument $\Delta y_{i,t-1}$ with the past level $y_{i,t-2}$, or past difference $y_{i,t-2}-y_{i,t-3}$. This estimator is consistent, since neither of these instruments is correlated with $\Delta \epsilon_{i,t}$, assuming error is not auto-correlated.

11.4 Arellano-Bond estimator

Arellano and Bond (1991) expanded the idea by using additional lags of the dependent variable as instruments. For example, both $y_{i,t-2}$ and $y_{i,t-3}$ can be used as instruments. In fact, as $t$ increases, the number of instruments available also increases. In period 3 only $y_{i,1}$ is available. In period 4 $y_{i,1}$ and $y_{i,2}$ are available. In period 5 $y_{i,1}$ and $y_{i,2}$ and $y_{i,3}$ are available, and so on. In other words, we’ll have an instrument matrix with one row for each time period that we are instrumenting:

\[Z_i = \begin{bmatrix} y_{i,1} & \ 0 & \ 0 & \ 0 & \ 0 & \ 0 & \ \cdots & \ 0 & \ 0 & \ 0 \\ 0 & \ y_{i,1} & \ y_{i,2} & \ 0 & \ 0 & \ 0 & \ \cdots & \ 0 & \ 0 & \ 0 \\ 0 & \ 0 & \ 0 & \ y_{i,1} & \ y_{i,2} & \ y_{i,3} & \ \cdots & \ 0 & \ 0 & \ 0 \\ \vdots & \ \vdots & \ \vdots & \ \vdots & \ \vdots & \ \ddots & \ \cdots & \ \vdots & \ \vdots & \ \vdots \\ 0 & \ 0 & \ 0 & \ 0 & \ 0 & \ \cdots & \ 0 & \ y_{i,1} & \ \cdots & \ y_{i,T-2} \end{bmatrix} \]

This is the so called difference GMM estimator.

11.5 Blundell and Bond estimator

AB model has a problem: when the instruments are weak, the estimator is not good. That will happen when $y$ follows random walk or near random walk. In that case the past levels won’t be a good predictor of the future changes.

BB add a second set of moment conditions in levels. The basic idea is that lagged differences $\Delta y_{i,t-1}$ are uncorrelated with the composite error $C_i + \epsilon_{i,t}$ under an additional initial-condition/stationarity restriction, so they can instrument the level equation.

This is implemented by so-called system GMM, which stacks two equations and estimates them jointly:

the differenced equation ($\Delta y_{i,t}$), instrumented by lagged levels $y_{i,t-2}, y_{i,t-3}, \dots$ (the Arellano–Bond moments); and
the level equation ($y_{i,t}$ in levels, keeping $C_i$), instrumented by lagged differences $\Delta y_{i,t-1}$.

The level-equation moments are valid only under the extra assumption that the initial deviations from the long-run mean are uncorrelated with $C_i$ (mean stationarity of the process). System GMM is more efficient than difference GMM when $y$ is persistent (near a random walk), and it allows time-invariant regressors. A practical warning: the instrument count grows with $T^2$, so instrument proliferation can overfit the endogenous regressors and weaken the Hansen $J$ test; lags should be collapsed or capped in practice.

# Dynamic Panel Data ## When Is It a Problem Model setup: $$y_{i,t} = \gamma y_{i, t-1} + X_{i,t}' \beta + C_i + \epsilon_{i,t}$$ Pooled regression is biased and inconsistent. The problem is: $$Cov( y_{i,t-1}, (C_i + \epsilon_{i,t})) \approx \frac{\sigma_c^2}{(1-\gamma)}$$ Random effects model is biased and inconsistent, for the same reason that $y_{i,t-1}$ is necessarily correlated with $C_i$, which is part of the composite error term in the random effect model. Fixed effect model setup: $$ y_{i,t} - \bar y_i = (X_{i,t} - \bar X_i)' \beta + \gamma (y_{i, t-1} - \overline{y_{i,-1}}) + (\epsilon_{i,t} - \bar \epsilon_i)$$ where $\overline{y_{i,-1}}$ is the unit mean of the *lagged* dependent variable. This is not exactly $\bar y_i$ (the mean of current $y_{i,t}$): the two differ at the first and last usable periods, though they coincide asymptotically and under the usual balanced-sample convention. Nickell (1981) shows that (see also Anderson and Hsiao 1981) $$Cov( (y_{i,t-1}- \overline{y_{i,-1}}), (\epsilon_{i,t} - \bar \epsilon_i )) \approx -\frac{\sigma_\epsilon^2}{T (1-\gamma)^2} [\frac{(T-1)-T \gamma + \gamma^T}{T}]$$ which indicates that the demeaned lagged $y$ is correlated with the demeaned error term. The correlation may be large if $T$ is small, but when $T$ is big, then the correlation goes down to near zero. When $T$ is big, we don't have a problem with fixed effect model. ## How Big Is the Bias When $T$ is small, which is typically the case in many microeconomic settings, then we say the fixed effect model with a lagged dependent variable is inconsistent. But how bad is it? The limit of $\hat \gamma - \gamma$ is approximately $-\frac{(1+\gamma)}{T-1}$. When $T=10$ and $\gamma=.5$, the bias is about $-0.167$ which is $1/3$ of the true value. But a simulation study (Judson and Owen 1999) shows that the bias can be as big as $20\%$ of the true coefficient even when $T=30$. ## Anderson and Hsiao estimator Anderson and Hsiao (1981) suggested looking at the first difference estimator: $$ y_{i,t} - y_{i,t-1} = (X_{i,t} - X_{i,t-1})' \beta + \gamma (y_{i,t-1} - y_{i,t-2}) + (\epsilon_{i,t} - \epsilon_{i,t-1}) $$ or $$ \Delta y_{i,t}= \Delta X_{i,t} \beta + \gamma \Delta y_{i,t-1} + \Delta \epsilon_{i,t} $$ This does not solve the endogeneity problem, since $\Delta y_{i,t-1}$ is still correlated with the error term. AH's idea is to instrument $\Delta y_{i,t-1}$ with the past level $y_{i,t-2}$, or past difference $y_{i,t-2}-y_{i,t-3}$. This estimator is consistent, since neither of these instruments is correlated with $\Delta \epsilon_{i,t}$, assuming error is not auto-correlated. ## Arellano-Bond estimator Arellano and Bond (1991) expanded the idea by using additional lags of the dependent variable as instruments. For example, both $y_{i,t-2}$ and $y_{i,t-3}$ can be used as instruments. In fact, as $t$ increases, the number of instruments available also increases. In period 3 only $y_{i,1}$ is available. In period 4 $y_{i,1}$ and $y_{i,2}$ are available. In period 5 $y_{i,1}$ and $y_{i,2}$ and $y_{i,3}$ are available, and so on. In other words, we'll have an instrument matrix with one row for each time period that we are instrumenting: $$Z_i = \begin{bmatrix} y_{i,1} & \ 0 & \ 0 & \ 0 & \ 0 & \ 0 & \ \cdots & \ 0 & \ 0 & \ 0 \\ 0 & \ y_{i,1} & \ y_{i,2} & \ 0 & \ 0 & \ 0 & \ \cdots & \ 0 & \ 0 & \ 0 \\ 0 & \ 0 & \ 0 & \ y_{i,1} & \ y_{i,2} & \ y_{i,3} & \ \cdots & \ 0 & \ 0 & \ 0 \\ \vdots & \ \vdots & \ \vdots & \ \vdots & \ \vdots & \ \ddots & \ \cdots & \ \vdots & \ \vdots & \ \vdots \\ 0 & \ 0 & \ 0 & \ 0 & \ 0 & \ \cdots & \ 0 & \ y_{i,1} & \ \cdots & \ y_{i,T-2} \end{bmatrix} $$ This is the so called difference GMM estimator. ## Blundell and Bond estimator AB model has a problem: when the instruments are weak, the estimator is not good. That will happen when $y$ follows random walk or near random walk. In that case the past levels won't be a good predictor of the future changes. BB add a second set of moment conditions in *levels*. The basic idea is that lagged differences $\Delta y_{i,t-1}$ are uncorrelated with the composite error $C_i + \epsilon_{i,t}$ under an additional initial-condition/stationarity restriction, so they can instrument the level equation. This is implemented by so-called *system GMM*, which stacks two equations and estimates them jointly: - the **differenced equation** ($\Delta y_{i,t}$), instrumented by lagged *levels* $y_{i,t-2}, y_{i,t-3}, \dots$ (the Arellano--Bond moments); and - the **level equation** ($y_{i,t}$ in levels, keeping $C_i$), instrumented by lagged *differences* $\Delta y_{i,t-1}$. The level-equation moments are valid only under the extra assumption that the initial deviations from the long-run mean are uncorrelated with $C_i$ (mean stationarity of the process). System GMM is more efficient than difference GMM when $y$ is persistent (near a random walk), and it allows time-invariant regressors. A practical warning: the instrument count grows with $T^2$, so *instrument proliferation* can overfit the endogenous regressors and weaken the Hansen $J$ test; lags should be collapsed or capped in practice.