7 Fixed or Random Effect, or Both?

Published

May 23, 2019

7.1 Panel data

When we have a panel data (repeated observations over time, or observations clustered at higher level), we usually think of two choices: random effect or fixed effect? Economists usually prefers fixed effect models, since it wipes out all within unit heterogeneity. Economists do not like random effect models since it has a big assumption: the random effects need to be uncorrelated to other covariates in the model. To see this, suppose we have

\[ y_{it} = \beta_0 + \beta_1 x_{it} + c_i + \epsilon_{it} \]

Suppose we have individuals \(i=1, ... , n\) measured at time \(t=1, ..., T\). Here \(c_i\) is the unobserved time-invariant individual effects. The difference between fixed and random effects is in how they handle \(c_i\).

Fixed effect models for a linear model can be implemented by one of these two methods: with dummies of individuals, or run an OLS with de-meaned \(y\) and \(x\). These two methods are equivalent. In a non-linear model, things are more difficult, except Poisson model, other non-linear model with dummies suffer “incidental parameter” problem. The gold-standard is to do a conditional likelihood (conditional logit for example), which “absorbs” the fixed effects in the likelihood function, therefore it’s not necessary to estimate them. Unfortunately most non-linear models do not have such nice conditional likelihood. In that case we can only hope the bias would be small (it does get smaller when you have deeper panel, that is , number of observations per individual).

Random effect models treat \(c_i\) as part of the error term. In that case, it comes the biggest drawback: the covariates have to be uncorrelated with the error term to have a consistent estimator. Therefore in the above equation, \(x\) has to be uncorrelated with \(c_i\), which economists in general do not think it’s realistic.

7.2 Time-invariant variables

Sometimes people are interested in the effect of time-invariant variables, thus the model

\[ y_{it} = \beta_0 + \beta_1 x_{it} + c_i + \gamma z_i+ \epsilon_{it} \]

Fixed effect models cannot handle this, because \(\gamma\) is not identified because \(z_i\) is perfectly collinear with \(c_i\). Random effect can still be estimated, treating \(z_i\) simply as another covariate.

7.3 Between-within model

Usually we were told to do a Hausman test to see whether we should use fixed effect or random effect model. The basic idea is the random effect is more efficient if the assumptions are satisfied. If not, then fixed effect model is still consistent. The Hausman test is to compare the difference between the two. If the difference is small then stick with random effect. If it’s big, then fixed effect should be preferred since it’s consistent.

However, there is a between-within model (BW) that can incorporate both. Neuhaus and Kalbfleisch (1998)(https://www.ncbi.nlm.nih.gov/pubmed/9629647) introduced BW estimator,

\[ y_{it} = \beta_0 + \beta_1 (x_{it} - \bar x_i) + \beta_2 \bar x_i + c_i + \gamma z_i+ \epsilon_{it} \]

It can be shown that \(\beta_1\) is the same as the one in the fixed effect model. It is the effect of within individual deviation of \(x\) on within individual deviation of \(y\). \(\beta_2\) is the effect of mean of \(x\) on mean of \(y\), that is, the “between” effect. \(\gamma\) is the effect of time-invariant variable on the mean of \(y\).

The other specification of BW estimator is

\[ y_{it} = \beta_0 + \beta_1 x_{it} + \beta_2 \bar x_i + c_i + \gamma z_i+ \epsilon_{it} \]

This is just some transformation of the original specification, it’s the same model. \(\beta_1\) is exactly the same as before, \(\beta_2\) becomes the difference between “within” and “between” effects. This is called “contextual model”, \(\beta_2\) is the “contextual” effect. See Neuhaus and Kalbfleisch (1998)(https://www.ncbi.nlm.nih.gov/pubmed/9629647). In this specification, \(\beta_2\) is actually similar to a Hausman test. It shows the difference between “between” and “within”.

One advantage of BW model is that it can incorporate fixed effect models along with a random effect estimation, thus including time-invariant covariates becomes possible. A second advantage is that it can do more complicated models, such as cross-level interactions, random slopes, or other multi-level models.

The actual implementation of the simplest form of BW is easy: simply use random effect models on the above two equations.

7.4 BW model in R

R has a package “panelr”(https://panelr.jacob-long.com/articles/wbm.html) that implements various kinds of BW models. Let’s see an example.

library(panelr)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model1 <- wbm(lwage ~ wks + union + ms + occ | blk + fem, data = wages)
summary(model1)

MODEL INFO:
Entities: 595
Time periods: 1-7
Dependent variable: lwage
Model type: Linear mixed effects
Specification: within-between

MODEL FIT:
AIC = 2036.78, BIC = 2119.13
Pseudo-R² (fixed effects) = 0.27
Pseudo-R² (total) = 0.69
Entity ICC = 0.57

WITHIN EFFECTS:
------------------------------------------
               Est.   S.E.   t val.      p
----------- ------- ------ -------- ------
wks            0.00   0.00     1.06   0.29
union          0.06   0.03     2.53   0.01
ms            -0.08   0.03    -2.57   0.01
occ           -0.08   0.02    -3.32   0.00
------------------------------------------

BETWEEN EFFECTS:
-------------------------------------------------
                      Est.   S.E.   t val.      p
------------------ ------- ------ -------- ------
(Intercept)           6.30   0.20    30.85   0.00
imean(wks)            0.01   0.00     2.25   0.02
imean(union)          0.15   0.03     4.67   0.00
imean(ms)             0.17   0.05     3.07   0.00
imean(occ)           -0.41   0.03   -13.31   0.00
blk                  -0.15   0.05    -2.81   0.00
fem                  -0.32   0.06    -4.96   0.00
-------------------------------------------------

p values calculated using df = 4153 
 
RANDOM EFFECTS:
------------------------------------
  Group      Parameter    Std. Dev. 
---------- ------------- -----------
    id      (Intercept)    0.2992   
 Residual                  0.2589   
------------------------------------

Let’s compare this with another popular package “lfe”.

library(lfe)
model2 <- felm(lwage ~ wks + union + ms + occ | id, data = wages)
summary(model2)


Call:
   felm(formula = lwage ~ wks + union + ms + occ | id, data = wages) 

Residuals:
     Min       1Q   Median       3Q      Max 
-1.89500 -0.16174  0.00652  0.17060  1.94521 

Coefficients:
       Estimate Std. Error t value Pr(>|t|)    
wks    0.001083   0.001019   1.063 0.287816    
union  0.064320   0.025378   2.534 0.011305 *  
ms    -0.082905   0.032226  -2.573 0.010132 *  
occ   -0.077507   0.023359  -3.318 0.000916 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2589 on 3566 degrees of freedom
Multiple R-squared(full model): 0.7304   Adjusted R-squared: 0.6852 
Multiple R-squared(proj model): 0.006509   Adjusted R-squared: -0.1601 
F-statistic(full model):16.16 on 598 and 3566 DF, p-value: < 2.2e-16 
F-statistic(proj model): 5.841 on 4 and 3566 DF, p-value: 0.0001106

We can see these two gives the same fixed effect estimation. “panelr” in addition estimates the effect of “blk” and “fem” which are time-invariant. But “lfe” has an advantage, it allows you to estimate fixed effect with clustered standard errors, which I wish “panelr” can do too.

model3 <- felm(lwage ~ wks + union + ms + occ | id | 0 | id, data = wages)
summary(model3)


Call:
   felm(formula = lwage ~ wks + union + ms + occ | id | 0 | id,      data = wages) 

Residuals:
     Min       1Q   Median       3Q      Max 
-1.89500 -0.16174  0.00652  0.17060  1.94521 

Coefficients:
       Estimate Cluster s.e. t value Pr(>|t|)  
wks    0.001083     0.001331   0.814   0.4160  
union  0.064320     0.040936   1.571   0.1167  
ms    -0.082905     0.047399  -1.749   0.0808 .
occ   -0.077507     0.031320  -2.475   0.0136 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2589 on 3566 degrees of freedom
Multiple R-squared(full model): 0.7304   Adjusted R-squared: 0.6852 
Multiple R-squared(proj model): 0.006509   Adjusted R-squared: -0.1601 
F-statistic(full model, *iid*):16.16 on 598 and 3566 DF, p-value: < 2.2e-16 
F-statistic(proj model): 3.456 on 4 and 594 DF, p-value: 0.008358

7.5 BW model in Stata

In stata, there is no package to do BW estimator. But we can do it with “xtreg”.

webuse nlswork
xtset idcode
xtreg ln_w age, fe cluster(idcode)

(National Longitudinal Survey of Young Women, 14-24 years old in 1968)


Panel variable: idcode (unbalanced)


Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1026                                         min =          1
     Between = 0.0877                                         avg =        6.1
     Overall = 0.0774                                         max =         15

                                                F(1, 4709)        =     884.05
corr(u_i, Xb) = 0.0314                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------

We then generate the mean of age and run a BW estimation.

webuse nlswork
xtset idcode
bysort idcode: center age, prefix(d) mean(m)
xtreg ln_w dage mage i.race, re cluster(idcode)

(National Longitudinal Survey of Young Women, 14-24 years old in 1968)


Panel variable: idcode (unbalanced)

(generated variables: dage mage)


Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1026                                         min =          1
     Between = 0.1040                                         avg =        6.1
     Overall = 0.0950                                         max =         15

                                                Wald chi2(4)      =    1335.89
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        dage |   .0181349     .00061    29.73   0.000     .0169394    .0193304
        mage |    .022558   .0011405    19.78   0.000     .0203226    .0247933
             |
        race |
      Black  |  -.1190246   .0127418    -9.34   0.000    -.1439982   -.0940511
      Other  |   .0974996   .0617364     1.58   0.114    -.0235016    .2185008
             |
       _cons |   1.037566   .0323185    32.10   0.000     .9742233     1.10091
-------------+----------------------------------------------------------------
     sigma_u |  .36581005
     sigma_e |  .30349389
         rho |  .59230575   (fraction of variance due to u_i)
------------------------------------------------------------------------------

In this BW model, we use the centered (within-group-demeaned) dage together with the group mean mage, which gives the standard within-between (Mundlak) decomposition: the coefficient on dage, .0181, is exactly the fixed-effect (“within”) coefficient on age, and the coefficient on mage, .0226, is directly the between effect. And we have the effect of time-invariant covariate race estimated. The advantage of using xtreg is that we have clustered standard errors implemented.

Note: if we instead ran xtreg ln_w age mage i.race, re cluster(idcode) — using the raw (uncentered) age together with mage — we would get the contextual model instead: the coefficient on age stays .0181, but the coefficient on mage becomes .0044, the “contextual effect” (the additional between-effect on top of the within effect). The two models are algebraically related: contextual-model mage = Mundlak-model mage − Mundlak-model dage (here, .0226 − .0181 ≈ .0044).

7.6 BW model in non-linear models

Paul Allison in his blog(https://statisticalhorizons.com/between-within-contextual-effects) mentioned using BW model for a binary outcome. I have not dig into the literature to see how large the bias can be using the BW , comparing to, say a conditional logit model. But if OLS is a good linear approximation of a logit model, BW model could be a good approximation with a binary outcome with panel data.