This document is licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin. Sourcecode and data are available here.
This example shows how to address the issue when group factors (random effects) and (time-constant) predictors correlate for mixed models, especially in panel data. Models, where predictors and group factors correlate, may have compromised estimates of uncertainty as well as possible bias. In particular in econometrics, fixed-effects models are considered the gold-standard to address such issues. However, it often makes no sense to consider group-effects as “fixed” over a long time period. Apart from this, there are more shortcomings of FE-models as well, see (Bell, Fairbrother, and Jones 2018), (Bell, Jones, and Fairbrother 2018) and (Bafumi and Gelman 2006).
The following equations and discussion on FE vs. RE are based on (Bell, Fairbrother, and Jones 2018). Further discussion on FE vs. RE, also at the end of this document, refer to (Gelman and Hill 2007) and (Bafumi and Gelman 2006).
The solution to the critics from “FE-modelers” is simple: If you include a group-mean of your variables in a random effects model (that is, calculating the mean of the predictor at each group-level and including it as a group-level predictor), it will give the same answer as a fixed effects model (see table 3 very below, and (Bell, Jones, and Fairbrother 2018) as reference). This is why FE-modelers often call this type of RE-model also a “kind of” FE-model, i.e. they define a RE model as a model where predictors are assumed uncorrelated with the residuals. However,
“Calling it a FE model is not just inaccurate. It also does down its potential. Eg FE models don’t usually include random slopes, and failing to do so can lead to incorrect SEs as well as being a less interesting and realistic model.”
“A random effects model is such because it has random effects (that is, higher-level entities treated as a distribution) in it rather than fixed effects (higher-level entities treated as dummy variables) in it.”
(Heisig, Schaeffer, and Giesecke 2017) demonstrate how ignoring random slopes, i.e. neglecting “cross-cluster differences in the effects of lower-level controls reduces the precision of estimated context effects, resulting in unnecessarily wide confidence intervals and low statistical power”. You may refer to this paper to justify a mixed model with random slopes over “simple” FE-models.
The following code snippets show how to translate the Equations from (Bell, Fairbrother, and Jones 2018) into R-code, using
glmmTMB() from the glmmTMB-package.
library(glmmTMB) library(parameters) library(sjPlot) library(lfe) load("example.RData")
Sourcecode and data are available here.
x_tv: time-varying variable
z1_ti: first time-invariant variable, co-variate
z2_ti: second time-invariant variable, co-variate
QoL: Response (quality of life of patient)
ID: patient ID
time: time-point of measurement
model_re <- glmmTMB( QoL ~ time + age + x_tv + z1_ti + z2_ti + (1 + time | ID), data = d, REML = TRUE )
Next is a model from Eq. 10, which includes the “de-meaned” time-varying variable as well as the “group-meaned” time-varying variable.
# compute mean of "x_tv" for each subject (ID) and # then "de-mean" x_tv d <- cbind( d, demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters" )
Now we have:
x_tv_between: time-varying variable with the mean of
x_tvaccross all time-points, for each patient (ID).
x_tv_within: the de-meaned time-varying variable
QoL_within are used to test different FE-models, which are described later. In those models, I also use a “de-meaned” response variable without the group-variable (
ID) as fixed effect (see Equation 6 in the paper).
Eq. 10 suggests allowing the “within-effect” (de-meaned) vary across individuals, that’s why
x_tv_within is added as random slope as well.
Here, the estimate of
x_tv_within indicates the within-subject effect, while the estimate of
x_tv_between indicates the between-subject effect. This model also allows for heterogenity across level-2 units, that’s why
x_tv_within also appears in the random effects. The estimates of
z2_ti also indicate a between-subject effect, as this is a level-2 variable, which cannot have a within-subject effect.
Here is the equation 10 from Bell et al. 2018:
yit = β0 + β1W (xit - ͞xi) + β2B ͞xi + β3 zi + υi0 + υi1 (xit - ͞xi) + εit
# Model fits... model_complex_rewb <- glmmTMB( QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within | ID), data = d, REML = TRUE ) # An alternative could be to model the random effects as not correlated. model_complex_rewb_2 <- glmmTMB( QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within || ID), REML = TRUE, data = d ) # here we get an error message when calling the summary(). The # model can't compute standard errors. So we don't use this model. # model_complex_rewb_2b <- glmmTMB( # QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + # (1 + time | ID) + (1 + x_tv_within | ID), # data = d, # REML = TRUE # ) # an alternative would be to assume independence between random slopes # and no covariance... model_complex_rewb_2c <- glmmTMB( QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time | ID) + (0 + x_tv_within | ID), data = d, REML = TRUE )
We compare all model fits, but we go on with
model_complex_rewb for now… Note that in the same examples with
lmer(), we took the model where the random parts were
(1 + time | ID) + (1 + x_tv_within | ID), like in the above model
model_complex_rewb_2b - however, model
model_complex_rewb_2b has serious problems with calculating the standard errors. Model
model_complex_rewb comes closest to the results from the lme4-model
tab_model( model_complex_rewb, model_complex_rewb_2, model_complex_rewb_2c, show.ci = FALSE, show.se = TRUE, auto.label = FALSE, string.se = "SE", show.icc = FALSE, dv.labels = c("Complex REWB (1)", "Complex REWB (2)", "Complex REWB (3)") )
|Complex REWB (1)||Complex REWB (2)||Complex REWB (3)|
|τ00||143.24 ID||108.66 ID||139.92 ID|
|0.00 ID.1||13.92 ID.1|
|τ11||1.09 ID.time||0.61 ID.time|
|N||188 ID||188 ID||188 ID|
|Marginal R2 / Conditional R2||0.395 / 0.713||0.573 / NA||0.415 / 0.693|
After email correspondance, the paper’s authors suggest that, depending on the research interest and necessary complexity of the model, a “simple” random-slope might be suitable as well. As stated in the paper, this is useful if homogenity across level-2 units is assumed. This model usually yields the same results as a FE-model, however, we additionally have information about the random effects - and the model can incorporate time-invariant covariates.
Again, the estimate of
x_tv_within indicates the within-subject effect, while the estimate of
x_tv_between indicates the between-subject effect.
yit = β0 + β1W (xit - ͞xi) + β2B ͞xi + β3 zi + (υi + εit)
model_simple_rewb <- glmmTMB( QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time | ID), data = d, REML = TRUE )
An alternativ would be the Mundlak model. Here, the estimate of
x_tv indicates the within-subject effect, while the estimate of
x_tv_between indicates the contextual effect.
yit = β0 + β1W xit + β2C͞xi + β3zi + (υi + εit)
model_mundlak <- glmmTMB( QoL ~ time + age + x_tv + x_tv_between + z1_ti + z2_ti + (1 + time | ID), data = d, REML = TRUE )
The contextual effect, i.e. the coefficient for
x_tv_between, indicates the effect of an individual (at level 1) that moves from one level-2 group into another one. In the above model, or in general: in case of longitudinal data, the contextual effect is meaningless, as level-2 predictors are individuals (or subjects) themselves, and by definition cannot “move” to another individual. Therefor, the REWB-model is more informative.
In table 2, we compare the “classical” RE-model, the complex REWB-model, the “simple” REWB-model and the Mundlak-Model.
tab_model( model_re, model_complex_rewb, model_simple_rewb, model_mundlak, show.ci = FALSE, show.se = TRUE, auto.label = FALSE, string.se = "SE", show.icc = FALSE, dv.labels = c("Classical RE", "Complex REWB", "Simple REWB", "Mundlak") )
|Classical RE||Complex REWB||Simple REWB||Mundlak|
|τ00||208.05 ID||143.24 ID||201.70 ID||201.70 ID|
|τ11||12.48 ID.time||1.09 ID.time||10.77 ID.time||10.77 ID.time|
|ρ01||-0.74 ID||-0.77||-0.77 ID||-0.77 ID|
|N||188 ID||188 ID||188 ID||188 ID|
|Marginal R2 / Conditional R2||0.311 / 0.619||0.395 / 0.713||0.382 / 0.649||0.382 / 0.649|
If the estimates of the within- and between-effect (
x_tv_between) are (almost) identical, or if the contextual effect (
x_tv_between) in the Mundlak-model is zero and doesn’t give a significant improvement for the model, you can also use a simple RE-model.
A simple way to check this is a likelihood-ratio test between the simple RE-model and the Mundlak-model:
anova(model_re, model_mundlak) #> Data: d #> Models: #> model_re: QoL ~ time + age + x_tv + z1_ti + z2_ti + (1 + time | ID), zi=~0, disp=~1 #> model_mundlak: QoL ~ time + age + x_tv + x_tv_between + z1_ti + z2_ti + (1 + , zi=~0, disp=~1 #> model_mundlak: time | ID), zi=~0, disp=~1 #> Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) #> model_re 10 4668.5 4711.9 -2324.3 4648.5 #> model_mundlak 11 4654.7 4702.4 -2316.3 4632.7 15.835 1 6.912e-05 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we see a significant improvement of the Mundlak-model over the simple RE-model, indicating that it makes sense to model within- and between-subjects effects, i.e. to apply a REWB-model.
felm() from the package lfe was used to compute the fixed effects regression models. Base R’s
lm() gives the same result, however, the output is much longer due to the ID-parameters.
yit = β1 (xit - ͞xi) + (υi + εit)
# Model from Equation 5 model_fe_ID <- felm( QoL ~ time + x_tv_within | ID, data = d ) # same as this lm-model # model_fe_ID <- lm( # QoL ~ 0 + time + x_tv_within + ID, # data = d # )
Equation 6 describes a fixed effects model with de-meaned dependent variable.
(yit - ͞yi) = β1 (xit - ͞xi) + εit
# Model from Equation 6 model_fe_y_within <- felm( QoL_within ~ time + x_tv_within, data = d ) # or ... # model_fe_y_within <- lm( # QoL_within ~ 0 + time + x_tv_within, # data = d # )
We compare the results from the FE-models with a simple RE-model and the REWB-model.
model_re_2 <- glmmTMB( QoL ~ time + x_tv_within + x_tv_between + (1 | ID), data = d, REML = TRUE ) # Compare with complex REWB-model model_complex_rewb3 <- glmmTMB( QoL ~ time + x_tv_within + x_tv_between + (1 + time + x_tv_within | ID), data = d, REML = TRUE )
As we can see, the estimates of the FE-models and the RE-model are identical. However, the estimates from the REWB-model differ. This is because the time-varying predictors, the within-subject effect
x_tv_within, is allowed to vary between subjects as well (i.e. it is modelled as random slope).
tab_model( model_fe_ID, model_fe_y_within, model_re_2, model_complex_rewb3, show.ci = FALSE, show.se = TRUE, auto.label = FALSE, string.se = "SE", show.icc = FALSE, dv.labels = c("FE-model with ID", "FE, de-meaned Y (with Intercept)", "RE", "Complex REWB") )
|FE-model with ID||FE, de-meaned Y (with Intercept)||RE||Complex REWB|
|τ00||97.62 ID||136.32 ID|
|N||188 ID||188 ID||188 ID|
|R2 / R2 adjusted||NA||0.180 / 0.177||0.369 / 0.615||0.389 / 0.709|
When group factors (random effects) and (time-constant) predictors correlate, it’s recommended to fit a complex random-effect-within-between model (REWB) instead of a “simple” mixed effects model. This requires de- and group-meaning the time-varying predictors. Depending on the data structure, random slope and intercept may correlate or not.
The random effects structure, i.e. how to model random slopes and intercepts and allow correlations among them, depends on the nature of the data. The benefits from using mixed effects models over fixed effects models are more precise estimates (in particular when random slopes are included) and the possibility to include between-subjects effects.
In case of convergence problems or singular fits, note that changing the optimizer might help. In this context, some models ran fine in lme4, while other models that had problems being fitted in lme4 ran without any problems in glmmTMB.
# compute group-mean of "x_tv" for each subject (ID) and # then "de-mean" x_tv d <- cbind( d, demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters" ) # This model gives a convergence warning: # Model convergence problem; singular convergence (7) m <- glmmTMB( QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within | ID), data = d, REML = TRUE ) # An alternative could be to model the random effects as not correlated. m <- glmmTMB( QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within || ID), REML = TRUE, data = d )
x_tv_withinindicates the within-subject effect
x_tv_betweenindicates the between-subject effect
z2_tialso indicate a between-subject effect
“But the so-called fixed effects model does not in general minimize bias. It only minimizes bias under some particular models. As I wrote above, ‘it’s just another model.’ Another way to see this, in the time-series cross-sectional case, is to recognize that there’s no reason to think of group-level coefficients as truly ‘fixed’. One example I remember was a regression on some political outcomes, with 40 years of data for each of 50 states, where the analysis included ‘fixed effects’ for states. I’m sorry but it doesn’t really make sense to think of Vermont from 1960 through 2000 as being ‘fixed’ in any sense.”
“I just don’t see how setting the group-level variance to infinity can be better than estimating it from the data or setting it to a reasonable finite value. That said, the big advantage of multilevel (“random effects”) modeling comes when you are interested in the varying coefficients themselves, or if you’re interested in predictions for new groups, or if you want the treatment effect itself to vary by group. On a slightly different note, I’m unhappy with many of the time-series cross-sectional analyses I’ve seen because I don’t buy the assumption of constancy over time. That is, I don’t really think those effects are “fixed”!"
“I don’t know that there’s anything much that’s time-invariant in what I study. But, in any case, the so-called fixed-effects analysis is mathematically a special case of multilevel modeling in which the group-level variance is set to infinity. I agree that there’s no need to “believe” that model for the method to work; however, I think it works because of some implicit additivity assumptions. I’d prefer to (a) allow the group-level variance to be finite, and (b) work in the relevant assumptions more directly."
Bafumi, Joseph, and Andrew Gelman. 2006. “Fitting Multilevel Models When Predictors and Group Effects Correlate.” In. Philadelphia, PA: Annual meeting of the American Political Science Association.
Bell, Andrew, Malcolm Fairbrother, and Kelvyn Jones. 2018. “Fixed and Random Effects Models: Making an Informed Choice.” Quality & Quantity. https://doi.org/10.1007/s11135-018-0802-x.
Bell, Andrew, Kelvyn Jones, and Malcolm Fairbrother. 2018. “Understanding and Misunderstanding Group Mean Centering: A Commentary on Kelley et Al.’s Dangerous Practice.” Quality & Quantity 52 (5): 2031–6. https://doi.org/10.1007/s11135-017-0593-5.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge ; New York: Cambridge University Press.
Heisig, Jan Paul, Merlin Schaeffer, and Johannes Giesecke. 2017. “The Costs of Simplicity: Why Multilevel Models May Benefit from Accounting for Cross-Cluster Differences in the Effects of Controls.” American Sociological Review 82 (4): 796–827. https://doi.org/10.1177/0003122417717901.