Jeff Wooldridge suggested using Correlated Random Effect (CRE) for binary outcomes in this twitter post: https://x.com/jmwooldridge/status/1986100627454206220

Here I summerize what I learned so far:

linear model

For panel data, the usual set up is:

\[y_{it} = \alpha + X_{it} \beta + v_i + \epsilon_{it} \]

Wooldridge (2021) paper on DiD shows that these models are the same: 1. Two way fixed effect OLS. Dummy variable approach is the same too. 2. Mundlak’s CRE approach. This includes a pooled OLS with Mundlak device (means of \(X\)’s by group), and CRE with random effects.

The idea of Chamberlain’s device is that since we don’t have \(v_i\), we can project \(X_i\) onto \(v_i\). Basically, we can replace \(v_i\) with projection of \(X_i\) onto \(v_i\). Mundlak’s device is a special case, that we put all \(X_t\)’s the same weight, thus the mean of \(X_{it}\) for each \(i\) is the projection of \(X_{it}\) onto \(v_i\). Replace \(v_i\) with that projection, and what is left then by definition is uncorrelated with \(X_{it}\).

Here is an example:

example 1

library(fixest)
library(bacondecomp)
data("castle")

# Fixed-effects model with individual and time fixed effects
fe_model_twoway <- feols(l_homicide ~ poverty + l_police | state + year, data = castle)
summary(fe_model_twoway)

OLS estimation, Dep. Var.: l_homicide
Observations: 550
Fixed-effects: state: 50,  year: 11
Standard-errors: IID 
          Estimate Std. Error   t value Pr(>|t|)    
poverty  -0.027068   0.013306 -2.034206 0.042471 *  
l_police  0.066033   0.104588  0.631364 0.528098    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.176686     Adj. R2: 0.898978
                 Within R2: 0.009286

# random effect
library(lme4)
re_model <- lmer(l_homicide ~ poverty + l_police +  factor(year) +  (1 | state), data = castle)
summary(re_model)

Linear mixed model fit by REML ['lmerMod']
Formula: l_homicide ~ poverty + l_police + factor(year) + (1 | state)
   Data: castle

REML criterion at convergence: 3.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.5357 -0.4657  0.0151  0.4601  3.6973 

Random effects:
 Groups   Name        Variance Std.Dev.
 state    (Intercept) 0.30041  0.5481  
 Residual             0.03551  0.1884  
Number of obs: 550, groups:  state, 50

Fixed effects:
                   Estimate Std. Error t value
(Intercept)       0.4395941  0.6019389   0.730
poverty          -0.0007991  0.0120087  -0.067
l_police          0.1665913  0.1015581   1.640
factor(year)2001  0.0225820  0.0379597   0.595
factor(year)2002  0.0005681  0.0389059   0.015
factor(year)2003  0.0475588  0.0395876   1.201
factor(year)2004  0.0419220  0.0411546   1.019
factor(year)2005  0.0612232  0.0435803   1.405
factor(year)2006  0.0806576  0.0418469   1.927
factor(year)2007  0.0790850  0.0409066   1.933
factor(year)2008  0.0401268  0.0414964   0.967
factor(year)2009 -0.0424481  0.0488959  -0.868
factor(year)2010 -0.0959167  0.0470940  -2.037

# Mundlak 
castle2 <- castle |> 
  group_by(state) |>
  mutate(poverty_mean = mean(poverty, na.rm = TRUE), l_police_mean= mean(l_police, na.rm = TRUE))
cre_model <- feols(l_homicide ~ poverty + poverty_mean + l_police + l_police_mean + factor(year), data = castle2)
summary(cre_model)

OLS estimation, Dep. Var.: l_homicide
Observations: 550
Standard-errors: IID 
                  Estimate Std. Error    t value   Pr(>|t|)    
(Intercept)      -7.800873   0.531186 -14.685774  < 2.2e-16 ***
poverty          -0.027068   0.030004  -0.902125 3.6740e-01    
poverty_mean      0.125283   0.030674   4.084272 5.0983e-05 ***
l_police          0.066033   0.235836   0.279996 7.7959e-01    
l_police_mean     1.318037   0.253323   5.202983 2.7987e-07 ***
factor(year)2001  0.033071   0.085349   0.387481 6.9855e-01    
factor(year)2002  0.022870   0.087972   0.259968 7.9499e-01    
factor(year)2003  0.074569   0.089852   0.829907 4.0696e-01    
factor(year)2004  0.079080   0.094152   0.839920 4.0133e-01    
factor(year)2005  0.109800   0.100737   1.089973 2.7622e-01    
factor(year)2006  0.118822   0.095991   1.237857 2.1631e-01    
factor(year)2007  0.115412   0.093475   1.234679 2.1749e-01    
factor(year)2008  0.077190   0.095054   0.812069 4.1711e-01    
factor(year)2009  0.027528   0.114987   0.239400 8.1089e-01    
factor(year)2010 -0.034286   0.110153  -0.311255 7.5573e-01    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.417154   Adj. R2: 0.486344

# Mundlak 
cre_model2 <- lmer(l_homicide ~ poverty + poverty_mean + l_police + l_police_mean + factor(year) +  (1 | state), data = castle2)
summary(cre_model2)

Linear mixed model fit by REML ['lmerMod']
Formula: l_homicide ~ poverty + poverty_mean + l_police + l_police_mean +  
    factor(year) + (1 | state)
   Data: castle2

REML criterion at convergence: -30.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.5991 -0.4336  0.0219  0.4307  3.6228 

Random effects:
 Groups   Name        Variance Std.Dev.
 state    (Intercept) 0.14872  0.3856  
 Residual             0.03518  0.1876  
Number of obs: 550, groups:  state, 50

Fixed effects:
                 Estimate Std. Error t value
(Intercept)      -7.80087    1.60978  -4.846
poverty          -0.02707    0.01331  -2.034
poverty_mean      0.12528    0.02360   5.309
l_police          0.06603    0.10459   0.631
l_police_mean     1.31804    0.30140   4.373
factor(year)2001  0.03307    0.03785   0.874
factor(year)2002  0.02287    0.03901   0.586
factor(year)2003  0.07457    0.03985   1.871
factor(year)2004  0.07908    0.04175   1.894
factor(year)2005  0.10980    0.04467   2.458
factor(year)2006  0.11882    0.04257   2.791
factor(year)2007  0.11541    0.04145   2.784
factor(year)2008  0.07719    0.04215   1.831
factor(year)2009  0.02753    0.05099   0.540
factor(year)2010 -0.03429    0.04885  -0.702

In this example, fixed effect model, CRE1 and CRE2 all give the same coefficient on poverty.

As we know, RE model assumes \(v_i\) is uncorrelated with \(X_{it}\), while CRE and FE allow for correlation between \(v_i\) and \(X_{it}\).

nonlinear model

What I did not realize before is that this is only valid for linear case. For OLS to be consistent, we only need contemporaneous exogeneity, which means \(E[\epsilon_{it} | X_{it}, v_i] = 0\). This is not the case for nonlinear cases, such as logit or probit models.

Binary outcome

For binary outcomes, the model is:

\[y_{it} = \Phi(\alpha + X_{it} \beta + v_i + \epsilon_{it}) \] I used to think it’s the best to use conditional logit. However, it needs “conditional indepencence” (serial independence could be a better name) to be consistent. That is, we have to assume the series of \(y_{it}\) is conditionally independent of each other given \(X_{it}\) and \(v_i\). This is a strong assumption. If we think we have serial correlation in the error term, then this would not hold. The other disadvantage of conditional logit (also called fixed effect logit) is that \(v_i\)’s are not estimated, it is just a nuisance parameter. Any partial effects cannot be calculated, because any partial effects are functions of \(v_i\)’s.

Instead, Wooldridge suggests using CRE probit. Basically adding Mundlak device (group means of \(X\)’s) to the model, and then use pooled probit. Note that we do not use RE probit or logit, since that would need conditional independence too.

For nonlinear panel data with unobserved heterogeneity (meaning there is \(v_i\) that we don’t observe), we have a few assumptions.

First, strict exogeneity, \[ D(y`_{it} | X_{i1}, ..., X_{iT}, v_i) = D(y_{it} | X_{it}, v_i) \] This means that the distribution of \(y_{it}\) only depends on \(X_{it}\), given \(v_i\).

Second, conditional independence, \[ D(y_{i1}, y_{i2}, ..., y_{iT} | X_i, v_i) = \prod_{i=1}^T D(y_{it} | X_i, v_i) \]

This means that the distribution of joint distriubtion of \(y_{it}\)’s can be modeled by each \(y_{it}\) independently, given \(X_i\) and \(v_i\). We need this in most nonlinear cases, but not in linear case.

Then we need to specify \(D(v_i) | X_i\). The random effect assumption is saying \[ D(v_i | X_i) = D(v_i) \], that is, \(v_i\) is independent of \(X_i\). This is a strong assumption. We should try to avoid this assumption.

Now, most nonlinear models make the second assumption, in the panel setting, when we have \(v_i\). That includes RE model, conditional logit. However, that does not include pooled probit/logit, or CRE with pooled logit/probit. That is why Wooldridge recommend using CRE with pooled probit.

Example 2

# generate a binary outcome, high_homocide

castle2 <- castle2 |> 
  mutate(high_homicide = ifelse(l_homicide > mean(l_homicide, na.rm = TRUE), 1, 0))

# CRE pooled probit
cre_probit <- glm(high_homicide ~ poverty + poverty_mean + l_police + l_police_mean + factor(year), data = castle2, family = binomial(link = "probit"))
# Compute Average Marginal Effects (AMEs)
library(marginaleffects)
ame_cre_probit <- avg_slopes(cre_probit)
ame_cre_probit


          Term    Contrast Estimate Std. Error       z Pr(>|z|)   S     2.5 %
 l_police      dY/dX        0.01924     0.2622  0.0734   0.9415 0.1 -0.494686
 l_police_mean dY/dX       -0.00304     0.2824 -0.0108   0.9914 0.0 -0.556572
 poverty       dY/dX       -0.06477     0.0334 -1.9417   0.0522 4.3 -0.130144
 poverty_mean  dY/dX        0.06620     0.0341  1.9424   0.0521 4.3 -0.000597
 year          2001 - 2000  0.06596     0.0990  0.6659   0.5054 1.0 -0.128159
 year          2002 - 2000  0.09470     0.1019  0.9293   0.3527 1.5 -0.105016
 year          2003 - 2000  0.22958     0.1020  2.2518   0.0243 5.4  0.029753
 year          2004 - 2000  0.15492     0.1082  1.4315   0.1523 2.7 -0.057193
 year          2005 - 2000  0.22388     0.1133  1.9766   0.0481 4.4  0.001880
 year          2006 - 2000  0.26184     0.1068  2.4526   0.0142 6.1  0.052595
 year          2007 - 2000  0.24899     0.1048  2.3756   0.0175 5.8  0.043564
 year          2008 - 2000 -0.03718     0.1099 -0.3383   0.7352 0.4 -0.252646
 year          2009 - 2000 -0.02471     0.1345 -0.1837   0.8542 0.2 -0.288280
 year          2010 - 2000 -0.21753     0.1186 -1.8339   0.0667 3.9 -0.450000
   97.5 %
 0.533162
 0.550499
 0.000608
 0.132987
 0.260070
 0.294406
 0.429403
 0.367039
 0.445887
 0.471080
 0.454421
 0.178276
 0.238865
 0.014950

Type: response

# linear model
cre_lm <- lm(high_homicide ~ poverty + poverty_mean + l_police + l_police_mean + factor(year), data = castle2)
ame_lm <- avg_slopes(cre_lm)
ame_lm


          Term    Contrast Estimate Std. Error       z Pr(>|z|)   S    2.5 %
 l_police      dY/dX        0.01234     0.2666  0.0463   0.9631 0.1 -0.51010
 l_police_mean dY/dX        0.00427     0.2865  0.0149   0.9881 0.0 -0.55724
 poverty       dY/dX       -0.06406     0.0339 -1.8891   0.0589 4.1 -0.13053
 poverty_mean  dY/dX        0.06578     0.0347  1.8975   0.0578 4.1 -0.00217
 year          2001 - 2000  0.06384     0.0965  0.6618   0.5081 1.0 -0.12522
 year          2002 - 2000  0.09082     0.0994  0.9134   0.3610 1.5 -0.10406
 year          2003 - 2000  0.22452     0.1016  2.2108   0.0270 5.2  0.02548
 year          2004 - 2000  0.14789     0.1064  1.3898   0.1646 2.6 -0.06068
 year          2005 - 2000  0.21659     0.1139  1.9023   0.0571 4.1 -0.00656
 year          2006 - 2000  0.25687     0.1085  2.3677   0.0179 5.8  0.04423
 year          2007 - 2000  0.24413     0.1056  2.3108   0.0208 5.6  0.03706
 year          2008 - 2000 -0.04737     0.1074 -0.4410   0.6592 0.6 -0.25794
 year          2009 - 2000 -0.03438     0.1300 -0.2645   0.7914 0.3 -0.28910
 year          2010 - 2000 -0.20933     0.1245 -1.6814   0.0927 3.4 -0.45335
 97.5 %
 0.5348
 0.5658
 0.0024
 0.1337
 0.2529
 0.2857
 0.4236
 0.3565
 0.4397
 0.4695
 0.4512
 0.1632
 0.2203
 0.0347

Type: response

# CRE RE probit
cre_re_probit <- glmer(high_homicide ~ poverty + poverty_mean + l_police + l_police_mean + factor(year) +  (1 | state), data = castle2, family = binomial(link = "probit"))
ame_cre_re_probit <- avg_slopes(cre_re_probit)
ame_cre_re_probit


          Term    Contrast Estimate Std. Error       z Pr(>|z|)   S    2.5 %
 l_police      dY/dX        0.01927     0.2499  0.0771   0.9385 0.1 -0.47055
 l_police_mean dY/dX       -0.00307     0.2720 -0.0113   0.9910 0.0 -0.53625
 poverty       dY/dX       -0.06477     0.0336 -1.9275   0.0539 4.2 -0.13063
 poverty_mean  dY/dX        0.06619     0.0343  1.9285   0.0538 4.2 -0.00108
 year          2001 - 2000  0.06595     0.0995  0.6631   0.5072 1.0 -0.12898
 year          2002 - 2000  0.09469     0.1021  0.9279   0.3535 1.5 -0.10532
 year          2003 - 2000  0.22958     0.1018  2.2554   0.0241 5.4  0.03007
 year          2004 - 2000  0.15492     0.1085  1.4278   0.1534 2.7 -0.05775
 year          2005 - 2000  0.22388     0.1136  1.9714   0.0487 4.4  0.00130
 year          2006 - 2000  0.26184     0.1068  2.4524   0.0142 6.1  0.05258
 year          2007 - 2000  0.24899     0.1047  2.3784   0.0174 5.8  0.04380
 year          2008 - 2000 -0.03719     0.1101 -0.3378   0.7355 0.4 -0.25298
 year          2009 - 2000 -0.02471     0.1346 -0.1836   0.8543 0.2 -0.28848
 year          2010 - 2000 -0.21752     0.1188 -1.8308   0.0671 3.9 -0.45039
  97.5 %
 0.50910
 0.53011
 0.00109
 0.13347
 0.26089
 0.29471
 0.42908
 0.36759
 0.44646
 0.47110
 0.45418
 0.17860
 0.23906
 0.01535

Type: response

Ultimately we are interested in the partial effect, therefore we can compare across models. In this case, we should prefer average marginal effect by CRE pooled probit.