1 Interpreting interaction in a regression model

Published

December 7, 2017

1.1 Interaction with two binary variables

In a regression model with interaction term, people tend to pay attention to only the coefficient of the interaction term.

Let’s start with the simplest situation: \(x_1\) and \(x_2\) are binary and coded 0/1.

\[ E(y) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1x_2 \]

In this case, we have a saturated model; that is, we have three coefficients representing additive effects from the baseline situation (both \(x_1\) and \(x_2\) being 0). There are four different situations, with four combinations of \(x_1\) and \(x_2\).

A lot of people just pay attention to the interaction term. In the case of studying treatment effects between two groups, say female and male, that makes sense, the interaction term representing the difference between male and female in terms of treatment effect.

In this model:

\[ E(y) = \beta_0 + \beta_1 female + \beta_2 treatment + \beta_{12} female*treatment \]

The two dummy-coded binary variables, female and treatment, form four combinations. The following 2x2 table represents the expected means of the four cells(combinations).

	male	female
control	\[ \beta_0 \]	\[ \beta_0 + \beta_1 \]
treatment	\[ \beta_0 + \beta_2 \]	\[ \beta_0 + \beta_1 + \beta_2 + \beta_{12}\]

We can see from this table that, for example,

\[\beta_0=E(Y|(0,0))\];

that is, \(\beta_0\) is the expected mean of the cell (0,0) (male and control).

\[\beta_0 + \beta_1 =E(Y|(1,0))\];

that is ,\(\beta_0 + \beta_1\) is the expected mean of the cell (1,0) (female and control). And so on.

Now,

\[ \beta_{12} = (E(Y|(1,1))-E(Y|(0,1)))-(E(Y|(1,0))-E(Y|(0,0))) \]

that is, the coefficient on the interaction term is actually the difference in difference. That’s why in many situations, people are only interested in the interaction coefficient, since they are only interested in the diff-in-diff estimates. The usual diff-in-diff estimator in the causal inference literature refers to something similar, instead of female vs. male, people are interested in the treatment effect difference in before and after treatment. If we simply replace female/male dummy with before/after dummy, we can use the same logic. In those situations, it’s fine to mainly focus on the interaction term coefficient.

In some other situations, the three coefficients are equally important. It depends on your interest. For example, if we are interested in studying differences between union member and non-union member and black vs. non-black, we may not be only interested in the interaction effect. Instead, we might be interested in all four cells, maybe all possible pairwise comparisons. In that case, we should pay attention to all three coefficients. Stata’s “margins” command is of great help if we’d like to compare the cell means.

Let’s take a look from a sample example in Stata:

webuse union3
reg ln_wage i.union##i.black, r
margins union#black
margins union#black, pwcompare


. webuse union3
(NLS Women 14-24 in 1968)

. reg ln_wage i.union##i.black, r

Linear regression                               Number of obs     =      1,244
                                                F(3, 1240)        =      34.76
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0762
                                                Root MSE          =     .37699

------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     1.union |   .2045053   .0291682     7.01   0.000     .1472808    .2617298
     1.black |  -.1709034   .0308067    -5.55   0.000    -.2313425   -.1104644
             |
 union#black |
        1 1  |   .0386275   .0516609     0.75   0.455     -.062725      .13998
             |
       _cons |   1.657525   .0138278   119.87   0.000     1.630396    1.684653
------------------------------------------------------------------------------

. margins union#black

Adjusted predictions                                     Number of obs = 1,244
Model VCE: Robust

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
 union#black |
        0 0  |   1.657525   .0138278   119.87   0.000     1.630396    1.684653
        0 1  |   1.486621    .027529    54.00   0.000     1.432613     1.54063
        1 0  |    1.86203   .0256822    72.50   0.000     1.811644    1.912415
        1 1  |   1.729754   .0325611    53.12   0.000     1.665873    1.793635
------------------------------------------------------------------------------

. margins union#black, pwcompare

Pairwise comparisons of adjusted predictions             Number of obs = 1,244
Model VCE: Robust

Expression: Linear prediction, predict()

-----------------------------------------------------------------
                |            Delta-method         Unadjusted
                |   Contrast   std. err.     [95% conf. interval]
----------------+------------------------------------------------
    union#black |
(0 1) vs (0 0)  |  -.1709034   .0308067     -.2313425   -.1104644
(1 0) vs (0 0)  |   .2045053   .0291682      .1472808    .2617298
(1 1) vs (0 0)  |   .0722294   .0353756      .0028268     .141632
(1 0) vs (0 1)  |   .3754087   .0376487      .3015466    .4492709
(1 1) vs (0 1)  |   .2431328   .0426388      .1594807     .326785
(1 1) vs (1 0)  |  -.1322759   .0414705     -.2136359   -.0509159
-----------------------------------------------------------------

.

What we get by using “margins union#black” is the four cell means of \(E(Y)\), in this case, log of wage. Then “margins union#black, pwcompare” tells us all pairwise comparison of these four cell means. Instead of only paying attention to the interaction coefficient, in this case we might be interested in some comparisons of the four different situations of union and black. In this example, all six pairwise comparisons of the cell means have 95% confidence intervals that do not include zero — but this says nothing about the interaction term. Whether an individual cell mean differs from zero is a different question from whether the difference-in-differences (the interaction) is significant; since the interaction term here is insignificant, the difference between the union and non-union effects across black/non-black groups is not statistically distinguishable from zero, even though each cell mean individually is.

1.2 Interaction with continuous variables

Let’s start with the simplest situation: \(x_1\) and \(x_2\) are continuous.

\[ E(y) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1*x_2 \]

In this case, we recommend “centering” \(x_1\) and \(x_2\) if they are continuous; that is, subtracting the mean value from each continuous independent variable when they are involved in the interaction term. Centering is a linear transformation, so it does not change the model’s fitted values, \(R^2\), or the coefficient, standard error, t-statistic, or p-value of the interaction term \(\beta_{12}\) itself. What it does change is the interpretation of the main effects:

To reduce apparent multi-collinearity between the main effects and the interaction term. If the range of \(x_1\) and \(x_2\) include only positive numbers, \(x_1*x_2\) can be highly correlated with \(x_1\) and/or \(x_2\), which inflates the standard errors of \(\beta_1\) and \(\beta_2\) (but not \(\beta_{12}\)) and makes those two coefficients numerically unstable to estimate, even though the model’s predictions are unaffected.

“Centering” reduces the correlation between the interaction term and the main-effect variables. If the original variables are normally distributed, the interaction term after centering is actually uncorrelated with the (centered) main effects. When they are not normally distributed, centering will still reduce the correlation to a large degree. This is an interpretational and numerical-stability convenience for \(\beta_1\) and \(\beta_2\), not a fix that improves estimation of the interaction effect itself.

To help with interpretation. In a model with interaction, \(\beta_1\) represents the effect of \(x_1\) when \(x_2\) is zero. However, in many situations, zero is not within the range of \(x_2\). After centering, centered \(x_2\) at zero simply means original \(x_2\) at its mean value, so \(\beta_1\) is now the effect of \(x_1\) evaluated at the mean of \(x_2\) instead of at \(x_2=0\).

When we have dummy variable interacting with continuous variable, only continuous variable should be centered.

Again, Stata’s margins command is helpful.

sysuse auto
sum mpg
gen mpg_centered=mpg-r(mean)
sum mpg_centered
reg price i.foreign##c.mpg_centered
margins foreign, at(mpg_centered=(-3 (1) 3))
marginsplot


. sysuse auto
(1978 automobile data)

. sum mpg

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |         74     21.2973    5.785503         12         41

. gen mpg_centered=mpg-r(mean)

. sum mpg_centered

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
mpg_centered |         74   -4.03e-08    5.785503  -9.297297    19.7027

. reg price i.foreign##c.mpg_centered

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(3, 70)        =      9.48
       Model |   183435285         3  61145094.9   Prob > F        =    0.0000
    Residual |   451630112        70  6451858.74   R-squared       =    0.2888
-------------+----------------------------------   Adj R-squared   =    0.2584
       Total |   635065396        73  8699525.97   Root MSE        =    2540.1

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |   1666.519    717.217     2.32   0.023     236.0751    3096.963
mpg_centered |  -329.2551   74.98545    -4.39   0.000    -478.8088   -179.7013
             |
     foreign#|
          c. |
mpg_centered |
    Foreign  |   78.88826   112.4812     0.70   0.485    -145.4485     303.225
             |
       _cons |   5588.295   369.0945    15.14   0.000     4852.159    6324.431
------------------------------------------------------------------------------

. margins foreign, at(mpg_centered=(-3 (1) 3))

Adjusted predictions                                        Number of obs = 74
Model VCE: OLS

Expression: Linear prediction, predict()
1._at: mpg_centered = -3
2._at: mpg_centered = -2
3._at: mpg_centered = -1
4._at: mpg_centered =  0
5._at: mpg_centered =  1
6._at: mpg_centered =  2
7._at: mpg_centered =  3

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
 _at#foreign |
 1#Domestic  |    6576.06    370.446    17.75   0.000     5837.229    7314.891
  1#Foreign  |   8005.915   766.8178    10.44   0.000     6476.545    9535.284
 2#Domestic  |   6246.805   354.4734    17.62   0.000      5539.83     6953.78
  2#Foreign  |   7755.548   709.9327    10.92   0.000     6339.632    9171.464
 3#Domestic  |    5917.55   354.0032    16.72   0.000     5211.513    6623.587
  3#Foreign  |   7505.181   658.8306    11.39   0.000     6191.185    8819.177
 4#Domestic  |   5588.295   369.0945    15.14   0.000     4852.159    6324.431
  4#Foreign  |   7254.814   614.9548    11.80   0.000     6028.325    8481.303
 5#Domestic  |    5259.04    397.981    13.21   0.000     4465.292    6052.788
  5#Foreign  |   7004.447   579.9479    12.08   0.000     5847.778    8161.117
 6#Domestic  |   4929.785   437.9413    11.26   0.000     4056.338    5803.231
  6#Foreign  |   6754.081   555.4891    12.16   0.000     5646.192    7861.969
 7#Domestic  |    4600.53    486.253     9.46   0.000     3630.729    5570.331
  7#Foreign  |   6503.714   543.0057    11.98   0.000     5420.723    7586.704
------------------------------------------------------------------------------

. marginsplot

Variables that uniquely identify margins: mpg_centered foreign

.

In this example, the graph shows the predicted price for foreign and domestic cars at different level of mpg.