\[ (y_{it} - \bar y_i + \bar {\bar y}) = \alpha + (X_{it} - \bar X
+ \bar {\bar X}) \beta + (\epsilon_{it} - \bar \epsilon_i + \bar v) + \bar {\bar \epsilon}\] This is basically OLS on de-meaned \(Y\) and \(X\).
9.3 Random effect
The random effect looks at this at different angle: it treated \(v_i + \epsilon_{it}\) as the error term. There are two components of the error term. Suppose they are estimated as \(\hat \sigma^2_e\) (idiosyncratic component), and \(\hat \sigma^2_u\) (individual component). Then we can do GLS transformation:
where \(T_i\) is the number of observations for individual \(i\).
Given estimates of \(\hat \sigma^2_e\) and \(\hat \sigma^2_u\), we can run OLS on transformed variables (including \(y\) and all \(X\)’s). We can iterate the process.
9.4 Correlated random effect
Correlated random effect (CRE) can be done by running a random effect model of \(y_{it}\) on \(X_{it}\), \(\bar X_i\), \(z_i\) and a constant.
The shortcoming of RE model is that it has to assume \(v_i\) and \(X_{it}\) are uncorrelated to have a consistent estimator. If that is not true (most eocnomists don’t think it is), then we have an inconsistent estimator. Meanwhile, FE estimator is consistent, because \(v_i\) is not in the error term, it gets cancelled out.
The disadvantage of FE model is that it cannot include any time-invariant covariate, such as race, gender, etc.
The benefit of CRE is that it can include \(z_i\)’s which are time-invariant, while remain consistent. In fact, Mundlak and Wooldridge pointed out that CRE estimates on \(X_{it}\) are the same as FE estimates.
There is also a Mundlak test to choose between RE or CRE.
9.5 Example
Stata 19 implemented CRE in the xtreg command, with “cre” option. The following example is from Stata’s website, using the nlswork dataset.
webuse nlsworkxtreg ln_wage tenure age i.race, cre vce(cluster idcode)
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)
note: 2.race omitted from xt_means because of collinearity.
note: 3.race omitted from xt_means because of collinearity.
Correlated random-effects regression Number of obs = 28,101
Group variable: idcode Number of groups = 4,699
R-squared: Obs per group:
Within = 0.1296 min = 1
Between = 0.2346 avg = 6.0
Overall = 0.1890 max = 15
Wald chi2(4) = 1685.18
corr(xit_vars*b, xt_means*γ) = 0.5474 Prob > chi2 = 0.0000
(Std. err. adjusted for 4,699 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
xit_vars |
tenure | .0211313 .0012113 17.44 0.000 .0187572 .0235055
age | .0121949 .0007414 16.45 0.000 .0107417 .013648
|
race |
Black | -.1312068 .0117856 -11.13 0.000 -.1543061 -.1081075
Other | .1059379 .0593177 1.79 0.074 -.0103225 .2221984
|
_cons | 1.2159 .0306965 39.61 0.000 1.155736 1.276064
-------------+----------------------------------------------------------------
xt_means |
tenure | .0376991 .002281 16.53 0.000 .0332283 .0421698
age | -.0011984 .0013313 -0.90 0.368 -.0038077 .0014109
|
race |
Black | 0 (omitted)
Other | 0 (omitted)
-------------+----------------------------------------------------------------
sigma_u | .33334407
sigma_e | .29808194
rho | .55567161 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Mundlak test (xt_means = 0): chi2(2) = 331.5144 Prob > chi2 = 0.0000
To compare with a fixed effect model:
webuse nlsworkxtreg ln_wage tenure age i.race, fevce(cluster idcode)
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)
note: 2.race omitted because of collinearity.
note: 3.race omitted because of collinearity.
Fixed-effects (within) regression Number of obs = 28,101
Group variable: idcode Number of groups = 4,699
R-squared: Obs per group:
Within = 0.1296 min = 1
Between = 0.1916 avg = 6.0
Overall = 0.1456 max = 15
F(2, 4698) = 766.79
corr(u_i, Xb) = 0.1302 Prob > F = 0.0000
(Std. err. adjusted for 4,699 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
tenure | .0211313 .0012112 17.45 0.000 .0187568 .0235059
age | .0121949 .0007414 16.45 0.000 .0107414 .0136483
|
race |
Black | 0 (omitted)
Other | 0 (omitted)
|
_cons | 1.256467 .0194187 64.70 0.000 1.218397 1.294537
-------------+----------------------------------------------------------------
sigma_u | .39034493
sigma_e | .29808194
rho | .63165531 (fraction of variance due to u_i)
------------------------------------------------------------------------------
We see the coefficient estimates are the same for “tenure” and “age”, but CRE model allows you to estimate the effect of “race”.
We can also manually do it by using a RE model on \(X\), \(\bar X\) and \(z\):
---title: "Correlated Random Effect"date: "2025-04-15"---## Panel dataFor panel data, the usual set up is:$$y_{it} = \alpha + X_{it} \beta + v_i + \epsilon_{it} $$## Fixed effectA fixed effect model can be done with OLS on $$ (y_{it} - \bar y_i + \bar {\bar y}) = \alpha + (X_{it} - \bar X + \bar {\bar X}) \beta + (\epsilon_{it} - \bar \epsilon_i + \bar v) + \bar {\bar \epsilon}$$This is basically OLS on de-meaned $Y$ and $X$.## Random effectThe random effect looks at this at different angle: it treated $v_i + \epsilon_{it}$ as the error term. There are two components of the error term. Suppose they are estimated as $\hat \sigma^2_e$ (idiosyncratic component), and $\hat \sigma^2_u$ (individual component). Then we can do GLS transformation:$z^*_{it} = z_{it} - \hat \theta_i \bar z_i$and $\hat \theta_i = 1 - \sqrt{\frac{\hat \sigma^2_e}{T_i \hat \sigma^2_u + \hat \sigma^2_e}}$where $T_i$ is the number of observations for individual $i$.Given estimates of $\hat \sigma^2_e$ and $\hat \sigma^2_u$, we can run OLS on transformed variables (including $y$ and all $X$'s). We can iterate the process.## Correlated random effectCorrelated random effect (CRE) can be done by running a random effect model of $y_{it}$ on $X_{it}$, $\bar X_i$, $z_i$ and a constant. The shortcoming of RE model is that it has to assume $v_i$ and $X_{it}$ are uncorrelated to have a consistent estimator. If that is not true (most eocnomists don't think it is), then we have an inconsistent estimator. Meanwhile, FE estimator is consistent, because $v_i$ is not in the error term, it gets cancelled out.The disadvantage of FE model is that it cannot include any time-invariant covariate, such as race, gender, etc.The benefit of CRE is that it can include $z_i$'s which are time-invariant, while remain consistent. In fact, Mundlak and Wooldridge pointed out that CRE estimates on $X_{it}$ are the same as FE estimates.There is also a Mundlak test to choose between RE or CRE.## ExampleStata 19 implemented CRE in the `xtreg` command, with "cre" option. The following example is from Stata's website, using the `nlswork` dataset.```{r}#| label: setup#| include: falseknitr::opts_chunk$set(echo =TRUE)``````{r}#| include: falselibrary(Statamarkdown)stataexe <-find_stata()#stataexe <- "/usr/local/bin/stata"knitr::opts_chunk$set(engine.path=list(stata=stataexe))``````{stata}*| label: stata1*| echo: true*| cache: true*| collectcode: truewebuse nlsworkxtreg ln_wage tenure age i.race, cre vce(cluster idcode)```To compare with a fixed effect model:```{stata}*| label: stata2*| echo: true*| cache: true*| collectcode: truewebuse nlsworkxtreg ln_wage tenure age i.race, fe vce(cluster idcode)```We see the coefficient estimates are the same for "tenure" and "age", but CRE model allows you to estimate the effect of "race".We can also manually do it by using a RE model on $X$, $\bar X$ and $z$:```{stata}*| label: stata3*| echo: true*| cache: true*| collectcode: truewebuse nlsworkegen age_mean = mean(age), by(idcode)egen tenure_mean = mean(tenure), by(idcode)xtreg ln_wage tenure tenure_mean age age_mean i.race, vce(cluster idcode)```This is what stata's "cre" option is doing behind the scene.