cc

This document is licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin. Sourcecode and data are available here.

The violation of model-assumptions in RE-models for panel data

This example shows how to address the issue when group factors (random effects) and (time-constant) predictors correlate for mixed models, especially in panel data. Models, where predictors and group factors correlate, may have compromised estimates of uncertainty as well as possible bias. In particular in econometrics, fixed-effects models are considered the gold-standard to address such issues. However, it often makes no sense to consider group-effects as “fixed” over a long time period. Apart from this, there are more shortcomings of FE-models as well, see (Bell, Fairbrother, and Jones 2018), (Bell, Jones, and Fairbrother 2018) and (Bafumi and Gelman 2006).

The following equations and discussion on FE vs. RE are based on (Bell, Fairbrother, and Jones 2018). Further discussion on FE vs. RE, also at the end of this document, refer to (Gelman and Hill 2007) and (Bafumi and Gelman 2006).

Adding group meaned predictors to solve this issue

The solution to the critics from “FE-modelers” is simple: If you include a group-mean of your variables in a random effects model (that is, calculating the mean of the predictor at each group-level and including it as a group-level predictor), it will give the same answer as a fixed effects model (see table 3 very below, and (Bell, Jones, and Fairbrother 2018) as reference). This is why FE-modelers often call this type of RE-model also a “kind of” FE-model, i.e. they define a RE model as a model where predictors are assumed uncorrelated with the residuals. However,

“Calling it a FE model is not just inaccurate. It also does down its potential. Eg FE models don’t usually include random slopes, and failing to do so can lead to incorrect SEs as well as being a less interesting and realistic model.”

“A random effects model is such because it has random effects (that is, higher-level entities treated as a distribution) in it rather than fixed effects (higher-level entities treated as dummy variables) in it.”

source: Twitter-Discussion 1, Twitter-Discussion 2

Problems of ignoring random slopes in Fixed Effects models

(Heisig, Schaeffer, and Giesecke 2017) demonstrate how ignoring random slopes, i.e. neglecting “cross-cluster differences in the effects of lower-level controls reduces the precision of estimated context effects, resulting in unnecessarily wide confidence intervals and low statistical power”. You may refer to this paper to justify a mixed model with random slopes over “simple” FE-models.

Examples

The following code snippets show how to translate the Equations from (Bell, Fairbrother, and Jones 2018) into R-code, using glmmTMB() from the glmmTMB-package.

library(glmmTMB)
library(parameters)
library(sjPlot)
library(lfe)

load("example.RData")

Sourcecode and data are available here.

Description of the data

“Classical” growth-model for longitudinal data

model_re <- glmmTMB(
  QoL ~ time + age + x_tv + z1_ti + z2_ti + (1 + time | ID),
  data = d,
  REML = TRUE
)

Computing the de-meaned and group-meaned variables

Next is a model from Eq. 10, which includes the “de-meaned” time-varying variable as well as the “group-meaned” time-varying variable.

# compute mean of "x_tv" for each subject (ID) and
# then "de-mean" x_tv
d <- cbind(
  d,
  demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters"
)

Now we have:

QoL_between and QoL_within are used to test different FE-models, which are described later. In those models, I also use a “de-meaned” response variable without the group-variable (ID) as fixed effect (see Equation 6 in the paper).

The complex random-effect-within-between model (REWB)

Eq. 10 suggests allowing the “within-effect” (de-meaned) vary across individuals, that’s why x_tv_within is added as random slope as well.

Here, the estimate of x_tv_within indicates the within-subject effect, while the estimate of x_tv_between indicates the between-subject effect. This model also allows for heterogenity across level-2 units, that’s why x_tv_within also appears in the random effects. The estimates of z1_ti and z2_ti also indicate a between-subject effect, as this is a level-2 variable, which cannot have a within-subject effect.

Model from Equation 10

Here is the equation 10 from Bell et al. 2018:

yit = β0 + β1W (xit - ͞xi) + β2B ͞xi + β3 zi + υi0 + υi1 (xit - ͞xi) + εit

  • xit - ͞xi is the de-meaned predictor, x_tv_within
  • ͞xi is the group-meaned predictor, x_tv_between
  • β1W is the coefficient for x_tv_within (within-subject)
  • β2B is the coefficient for x_tv_between (bewteen-subject)
  • β3 is the coefficient for z1_ti or z2_ti (bewteen-subject)
# Model fits...
model_complex_rewb <- glmmTMB(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within | ID),
  data = d,
  REML = TRUE
)

# An alternative could be to model the random effects as not correlated.
model_complex_rewb_2 <- glmmTMB(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within || ID),
  REML = TRUE,
  data = d
)

# here we get an error message when calling the summary(). The 
# model can't compute standard errors. So we don't use this model.
# model_complex_rewb_2b <- glmmTMB(
#   QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 
#     (1 + time | ID) + (1 + x_tv_within | ID),
#   data = d,
#   REML = TRUE
# )

# an alternative would be to assume independence between random slopes
# and no covariance...
model_complex_rewb_2c <- glmmTMB(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti +
    (1 + time | ID) + (0 + x_tv_within | ID),
  data = d,
  REML = TRUE
)

We compare all model fits, but we go on with model_complex_rewb for now… Note that in the same examples with lmer(), we took the model where the random parts were (1 + time | ID) + (1 + x_tv_within | ID), like in the above model model_complex_rewb_2b - however, model model_complex_rewb_2b has serious problems with calculating the standard errors. Model model_complex_rewb comes closest to the results from the lme4-model model_complex_rewb_2.

Table 1: Comparison of complex REWB-Models

tab_model(
  model_complex_rewb, model_complex_rewb_2, model_complex_rewb_2c,
  show.ci = FALSE, 
  show.se = TRUE, 
  auto.label = FALSE, 
  string.se = "SE",
  show.icc = FALSE,
  dv.labels = c("Complex REWB (1)", "Complex REWB (2)", "Complex REWB (3)")
)
  Complex REWB (1) Complex REWB (2) Complex REWB (3)
Predictors Estimates SE p Estimates SE p Estimates SE p
(Intercept) 61.82 5.86 <0.001 62.70 5.85 <0.001 62.45 5.85 <0.001
time 1.16 0.61 0.057 1.16 0.61 0.055 1.17 0.61 0.054
age -0.22 0.20 0.265 -0.17 0.19 0.386 -0.19 0.20 0.322
x_tv_within -4.44 0.59 <0.001 -4.55 0.60 <0.001 -4.54 0.59 <0.001
x_tv_between -6.29 0.52 <0.001 -6.19 0.52 <0.001 -6.22 0.52 <0.001
z1_ti 5.11 4.36 0.241 4.82 4.38 0.271 4.74 4.37 0.277
z2_ti 0.00 0.00 0.090 0.00 0.00 0.139 0.00 0.00 0.114
Random Effects
σ2 119.35 119.88 120.00
τ00 143.24 ID 108.66 ID 139.92 ID
  0.00 ID.1 13.92 ID.1
  14.58 ID.2  
τ11 1.09 ID.time   0.61 ID.time
13.92 ID.x_tv_within    
ρ01 -0.77   -0.91 ID
0.36    
N 188 ID 188 ID 188 ID
Observations 564 564 564
Marginal R2 / Conditional R2 0.395 / 0.713 0.573 / NA 0.415 / 0.693

The simple random-effect-within-between model (REWB) and Mundlak model

After email correspondance, the paper’s authors suggest that, depending on the research interest and necessary complexity of the model, a “simple” random-slope might be suitable as well. As stated in the paper, this is useful if homogenity across level-2 units is assumed. This model usually yields the same results as a FE-model, however, we additionally have information about the random effects - and the model can incorporate time-invariant covariates.

Again, the estimate of x_tv_within indicates the within-subject effect, while the estimate of x_tv_between indicates the between-subject effect.

Model from Equation 2

yit = β0 + β1W (xit - ͞xi) + β2B ͞xi + β3 zi + (υi + εit)

model_simple_rewb <- glmmTMB(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time | ID),
  data = d,
  REML = TRUE
)

An alternativ would be the Mundlak model. Here, the estimate of x_tv indicates the within-subject effect, while the estimate of x_tv_between indicates the contextual effect.

Model from Equation 3

yit = β0 + β1W xit + β2C͞xi + β3zi + (υi + εit)

model_mundlak <- glmmTMB(
  QoL ~ time + age + x_tv + x_tv_between + z1_ti + z2_ti + (1 + time | ID),
  data = d,
  REML = TRUE
)

The contextual effect, i.e. the coefficient for x_tv_between, indicates the effect of an individual (at level 1) that moves from one level-2 group into another one. In the above model, or in general: in case of longitudinal data, the contextual effect is meaningless, as level-2 predictors are individuals (or subjects) themselves, and by definition cannot “move” to another individual. Therefor, the REWB-model is more informative.

Comparison of models

In table 2, we compare the “classical” RE-model, the complex REWB-model, the “simple” REWB-model and the Mundlak-Model.

Table 2: Comparison of RE, REWB and Mundlak Models

tab_model(
  model_re, model_complex_rewb, model_simple_rewb, model_mundlak,
  show.ci = FALSE, 
  show.se = TRUE, 
  auto.label = FALSE, 
  string.se = "SE",
  show.icc = FALSE,
  dv.labels = c("Classical RE", "Complex REWB", "Simple REWB", "Mundlak")
)
  Classical RE Complex REWB Simple REWB Mundlak
Predictors Estimates SE p Estimates SE p Estimates SE p Estimates SE p
(Intercept) 60.96 5.96 <0.001 61.82 5.86 <0.001 62.73 5.85 <0.001 62.73 5.85 <0.001
time 1.14 0.67 0.087 1.16 0.61 0.057 1.09 0.66 0.098 1.09 0.66 0.098
age -0.15 0.20 0.461 -0.22 0.20 0.265 -0.22 0.19 0.266 -0.22 0.19 0.266
x_tv -4.69 0.34 <0.001 -3.73 0.41 <0.001
z1_ti 4.48 4.44 0.314 5.11 4.36 0.241 4.43 4.35 0.308 4.43 4.35 0.308
z2_ti 0.00 0.00 0.024 0.00 0.00 0.090 0.00 0.00 0.097 0.00 0.00 0.097
x_tv_within -4.44 0.59 <0.001 -3.73 0.41 <0.001
x_tv_between -6.29 0.52 <0.001 -6.30 0.52 <0.001 -2.56 0.66 <0.001
Random Effects
σ2 142.88 119.35 142.16 142.16
τ00 208.05 ID 143.24 ID 201.70 ID 201.70 ID
τ11 12.48 ID.time 1.09 ID.time 10.77 ID.time 10.77 ID.time
  13.92 ID.x_tv_within    
ρ01 -0.74 ID -0.77 -0.77 ID -0.77 ID
  0.36    
N 188 ID 188 ID 188 ID 188 ID
Observations 564 564 564 564
Marginal R2 / Conditional R2 0.311 / 0.619 0.395 / 0.713 0.382 / 0.649 0.382 / 0.649

Check if a REWB- or simple RE-model suitable

If the estimates of the within- and between-effect (x_tv_within and x_tv_between) are (almost) identical, or if the contextual effect (x_tv_between) in the Mundlak-model is zero and doesn’t give a significant improvement for the model, you can also use a simple RE-model.

A simple way to check this is a likelihood-ratio test between the simple RE-model and the Mundlak-model:

anova(model_re, model_mundlak)
#> Data: d
#> Models:
#> model_re: QoL ~ time + age + x_tv + z1_ti + z2_ti + (1 + time | ID), zi=~0, disp=~1
#> model_mundlak: QoL ~ time + age + x_tv + x_tv_between + z1_ti + z2_ti + (1 + , zi=~0, disp=~1
#> model_mundlak:     time | ID), zi=~0, disp=~1
#>               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
#> model_re      10 4668.5 4711.9 -2324.3   4648.5                             
#> model_mundlak 11 4654.7 4702.4 -2316.3   4632.7 15.835      1  6.912e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we see a significant improvement of the Mundlak-model over the simple RE-model, indicating that it makes sense to model within- and between-subjects effects, i.e. to apply a REWB-model.

Comparison FE- and REWB-Model

The function felm() from the package lfe was used to compute the fixed effects regression models. Base R’s lm() gives the same result, however, the output is much longer due to the ID-parameters.

yit = β1 (xit - ͞xi) + (υi + εit)

# Model from Equation 5
model_fe_ID <- felm(
  QoL ~ time + x_tv_within | ID,
  data = d
)

# same as this lm-model
# model_fe_ID <- lm(
#   QoL ~ 0 + time + x_tv_within + ID,
#   data = d
# )

Equation 6 describes a fixed effects model with de-meaned dependent variable.

(yit - ͞yi) = β1 (xit - ͞xi) + εit

# Model from Equation 6
model_fe_y_within <- felm(
  QoL_within ~ time + x_tv_within,
  data = d
)

# or ...
# model_fe_y_within <- lm(
#   QoL_within ~ 0 + time + x_tv_within,
#   data = d
# )

We compare the results from the FE-models with a simple RE-model and the REWB-model.

model_re_2 <- glmmTMB(
  QoL ~ time + x_tv_within + x_tv_between + (1 | ID),
  data = d,
  REML = TRUE
)

# Compare with complex REWB-model
model_complex_rewb3 <- glmmTMB(
  QoL ~ time + x_tv_within + x_tv_between + (1 + time + x_tv_within | ID),
  data = d,
  REML = TRUE
)

As we can see, the estimates of the FE-models and the RE-model are identical. However, the estimates from the REWB-model differ. This is because the time-varying predictors, the within-subject effect x_tv_within, is allowed to vary between subjects as well (i.e. it is modelled as random slope).

Table 3: Comparison of FE- and RE-models

tab_model(
  model_fe_ID, model_fe_y_within, model_re_2, model_complex_rewb3,
  show.ci = FALSE, 
  show.se = TRUE, 
  auto.label = FALSE, 
  string.se = "SE",
  show.icc = FALSE,
  dv.labels = c("FE-model with ID", "FE, de-meaned Y (with Intercept)", "RE", "Complex REWB")
)
  FE-model with ID FE, de-meaned Y (with Intercept) RE Complex REWB
Predictors Estimates SE p Estimates SE p Estimates SE p Estimates SE p
time 1.09 0.64 0.089 1.09 0.52 0.037 1.09 0.64 0.088 1.15 0.61 0.059
x_tv_within -3.66 0.41 <0.001 -3.66 0.33 <0.001 -3.66 0.41 <0.001 -4.43 0.59 <0.001
(Intercept) -2.18 1.13 0.054 71.53 1.56 <0.001 71.40 1.55 <0.001
x_tv_between -6.28 0.50 <0.001 -6.37 0.50 <0.001
Random Effects
σ2     152.89 119.54
τ00     97.62 ID 136.32 ID
τ11       1.07 ID.time
      13.82 ID.x_tv_within
ρ01       -0.66
      0.34
N 188 ID   188 ID 188 ID
Observations 564 564 564 564
R2 / R2 adjusted NA 0.180 / 0.177 0.369 / 0.615 0.389 / 0.709

Conclusion

When group factors (random effects) and (time-constant) predictors correlate, it’s recommended to fit a complex random-effect-within-between model (REWB) instead of a “simple” mixed effects model. This requires de- and group-meaning the time-varying predictors. Depending on the data structure, random slope and intercept may correlate or not.

The random effects structure, i.e. how to model random slopes and intercepts and allow correlations among them, depends on the nature of the data. The benefits from using mixed effects models over fixed effects models are more precise estimates (in particular when random slopes are included) and the possibility to include between-subjects effects.

In case of convergence problems or singular fits, note that changing the optimizer might help. In this context, some models ran fine in lme4, while other models that had problems being fitted in lme4 ran without any problems in glmmTMB.

# compute group-mean of "x_tv" for each subject (ID) and
# then "de-mean" x_tv
d <- cbind(
  d,
  demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters"
)

# This model gives a convergence warning:
# Model convergence problem; singular convergence (7)
m <- glmmTMB(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within | ID),
  data = d,
  REML = TRUE
)

# An alternative could be to model the random effects as not correlated.
m <- glmmTMB(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within || ID),
  REML = TRUE,
  data = d
)

Further critics of the FE-approach

(source: http://andrewgelman.com/2012/04/02/fixed-effects-and-identification/)

“But the so-called fixed effects model does not in general minimize bias. It only minimizes bias under some particular models. As I wrote above, ‘it’s just another model.’ Another way to see this, in the time-series cross-sectional case, is to recognize that there’s no reason to think of group-level coefficients as truly ‘fixed’. One example I remember was a regression on some political outcomes, with 40 years of data for each of 50 states, where the analysis included ‘fixed effects’ for states. I’m sorry but it doesn’t really make sense to think of Vermont from 1960 through 2000 as being ‘fixed’ in any sense.”

“I just don’t see how setting the group-level variance to infinity can be better than estimating it from the data or setting it to a reasonable finite value. That said, the big advantage of multilevel (“random effects”) modeling comes when you are interested in the varying coefficients themselves, or if you’re interested in predictions for new groups, or if you want the treatment effect itself to vary by group. On a slightly different note, I’m unhappy with many of the time-series cross-sectional analyses I’ve seen because I don’t buy the assumption of constancy over time. That is, I don’t really think those effects are “fixed”!"

“I don’t know that there’s anything much that’s time-invariant in what I study. But, in any case, the so-called fixed-effects analysis is mathematically a special case of multilevel modeling in which the group-level variance is set to infinity. I agree that there’s no need to “believe” that model for the method to work; however, I think it works because of some implicit additivity assumptions. I’d prefer to (a) allow the group-level variance to be finite, and (b) work in the relevant assumptions more directly."

References

Bafumi, Joseph, and Andrew Gelman. 2006. “Fitting Multilevel Models When Predictors and Group Effects Correlate.” In. Philadelphia, PA: Annual meeting of the American Political Science Association.

Bell, Andrew, Malcolm Fairbrother, and Kelvyn Jones. 2018. “Fixed and Random Effects Models: Making an Informed Choice.” Quality & Quantity. https://doi.org/10.1007/s11135-018-0802-x.

Bell, Andrew, Kelvyn Jones, and Malcolm Fairbrother. 2018. “Understanding and Misunderstanding Group Mean Centering: A Commentary on Kelley et Al.’s Dangerous Practice.” Quality & Quantity 52 (5): 2031–6. https://doi.org/10.1007/s11135-017-0593-5.

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge ; New York: Cambridge University Press.

Heisig, Jan Paul, Merlin Schaeffer, and Johannes Giesecke. 2017. “The Costs of Simplicity: Why Multilevel Models May Benefit from Accounting for Cross-Cluster Differences in the Effects of Controls.” American Sociological Review 82 (4): 796–827. https://doi.org/10.1177/0003122417717901.