ggpredict.Rd
ggpredict()
computes estimated marginal means (predicted values) for the
response, at the margin of specific values from certain model terms.
ggeffect()
computes marginal effects by internally calling
Effect
, while ggemmeans()
computes marginal
effects by internally calling emmeans
.
The result is returned as consistent data frame.
ggeffect(model, terms, ci.lvl = 0.95, ...) ggemmeans( model, terms, ci.lvl = 0.95, type = "fe", typical = "mean", condition = NULL, back.transform = TRUE, ... ) ggpredict( model, terms, ci.lvl = 0.95, type = "fe", typical = "mean", condition = NULL, back.transform = TRUE, ppd = FALSE, vcov.fun = NULL, vcov.type = NULL, vcov.args = NULL, interval = "confidence", ... )
model  A fitted model object, or a list of model objects. Any model
that supports common methods like 

terms  Character vector (or a formula) with the names of those terms
from 
ci.lvl  Numeric, the level of the confidence intervals. For 
...  For 
type  Character, only applies for survival models, mixed effects models
and/or models with zeroinflation. Note: For

typical  Character vector, naming the function to be applied to the
covariates over which the effect is "averaged". The default is "mean".
See 
condition  Named character vector, which indicates covariates that
should be held constant at specific values. Unlike 
back.transform  Logical, if 
ppd  Logical, if 
vcov.fun  String, indicating the name of the 
vcov.type  Character vector, specifying the estimation type for the
robust covariance matrix estimation (see 
vcov.args  List of named vectors, used as additional arguments that
are passed down to 
interval  Type of interval calculation, can either be 
A data frame (with ggeffects
class attribute) with consistent
data columns:
x
the values of the first term in terms
, used as xposition in plots.
predicted
the predicted values of the response, used as yposition in plots.
std.error
the standard error of the predictions.
conf.low
the lower bound of the confidence interval for the predicted values.
conf.high
the upper bound of the confidence interval for the predicted values.
group
the grouping level from the second term in terms
, used as groupingaesthetics in plots.
facet
the grouping level from the third term in terms
, used to indicate facets in plots.
The estimated marginal means (predicted values) are always on the
response scale!
For proportional odds logistic regression (see polr
)
resp. cumulative link models (e.g., see clm
),
an additional column response.level
is returned, which indicates
the grouping of predictions based on the level of the model's response.
Note that for convenience reasons, the columns for the intervals
are always named conf.low
and conf.high
, even though
for Bayesian models credible or highest posterior density intervals
are returned.
A list of supported models can be found at https://github.com/strengejacke/ggeffects.
Support for models varies by function, i.e. although ggpredict()
,
ggemmeans()
and ggeffect()
support most models, some models
are only supported exclusively by one of the three functions.
ggpredict()
and ggeffect()
or ggemmeans()
ggpredict()
calls predict()
, while ggeffect()
calls Effect
and ggemmeans()
calls
emmeans
to compute marginal effects. Therefore,
ggpredict()
and ggeffect()
resp. ggemmeans()
differ in
how factors are held constant: ggpredict()
uses the reference level, while
ggeffect()
and ggemmeans()
compute a kind of "average" value,
which represents the proportions of each factor's category. Use condition
to set a specific level for factors in ggemmeans()
, so factors are
not averaged over their categories, but held constant at a given level.
Specific values of model terms can be specified via the terms
argument.
Indicating levels in square brackets allows for selecting only
specific groups or values resp. value ranges. Term name and the start of
the levels in brackets must be separated by a whitespace character, e.g.
terms = c("age", "education [1,3]")
. Numeric ranges, separated
with colon, are also allowed: terms = c("education", "age [30:60]")
.
The terms
argument also supports the same shortcuts as the
values
argument in values_at()
. So
terms = "age [meansd]"
would return predictions for the values
one standard deviation below the mean age, the mean age and
one SD above the mean age. terms = "age [quart2]"
would calculate
predictions at the value of the lower, median and upper quartile of age.
Furthermore, it is possible to specify a function name. Values for
predictions will then be transformed, e.g. terms = "income [exp]"
.
This is useful when model predictors were transformed for fitting the
model and should be backtransformed to the original scale for predictions.
It is also possible to define own functions (see
this vignette).
You can take a random sample of any size with sample=n
, e.g
terms = "income [sample=8]"
, which will sample eight values from
all possible values of the variable income
. This option is especially
useful for plotting marginal effects at certain levels of random effects
group levels, where the group factor has many levels that can be completely
plotted. For more details, see this vignette.
Finally, numeric vectors for which no specific values are given, a
"pretty range" is calculated (see pretty_range
), to avoid
memory allocation problems for vectors with many unique values. If a numeric
vector is specified as second or third term (i.e. if this vector represents
a grouping structure), representative values (see values_at
)
are chosen (unless other values are specified). If all values for a numeric
vector should be used to compute predictions, you may use e.g.
terms = "age [all]"
. See also package vignettes.
To create a pretty range that should be smaller or larger than the default
range (i.e. if no specific values would be given), use the n
tag,
e.g. terms="age [n=5]"
or terms="age [n=12]"
. Larger
values for n
return a larger range of predicted values.
For ggpredict()
, expand.grid()
is called on all unique
combinations of model.frame(model)[, terms]
and used as
newdata
argument for predict()
. In this case,
all remaining covariates that are not specified in terms
are
held constant: Numeric values are set to the mean (unless changed with
the condition
or typical
argument), factors are set to their
reference level (may also be changed with condition
) and character
vectors to their mode (most common element).
ggeffect()
and ggemmeans()
, by default, set remaining numeric
covariates to their mean value, while for factors, a kind of "average" value,
which represents the proportions of each factor's category, is used. For
ggemmeans()
, use condition
to set a specific level for
factors so that these are not averaged over their categories, but held
constant at the given level.
ggpredict()
also works with Stanmodels from
the rstanarm or brmspackage. The predicted
values are the median value of all drawn posterior samples. The
confidence intervals for Stanmodels are Bayesian predictive intervals.
By default (i.e. ppd = FALSE
), the predictions are based on
posterior_linpred
and hence have some
limitations: the uncertainty of the error term is not taken into
account. The recommendation is to use the posterior predictive
distribution (posterior_predict
).
Models of class brmsfit
always condition on the zeroinflation
component, if the model has such a component. Hence, there is no
type = "fe.zi"
nor type = "re.zi"
for brmsfit
models,
because predictions are based on draws of the posterior distribution,
which already account for the zeroinflation part of the model.
If model
is of class glmmTMB
, hurdle
, zeroinfl
or zerotrunc
, simulations from a multivariate normal distribution
(see mvrnorm
) are drawn to calculate mu*(1p)
.
Confidence intervals are then based on quantiles of these results. For
type = "re.zi"
, prediction intervals also take the uncertainty in
the randomeffect paramters into account (see also Brooks et al. 2017,
pp.391392 for details).
An alternative for models fitted with glmmTMB that take all model
uncertainties into account are simulations based on simulate()
, which
is used when type = "sim"
(see Brooks et al. 2017, pp.392393 for
details).
Predicted values for the fixed effects component (type = "fe"
or
type = "fe.zi"
) are based on predict(..., type = "mean_subject")
,
while predicted values for random effects components (type = "re"
or
type = "re.zi"
) are calculated with predict(..., type = "subject_specific")
(see ?GLMMadaptive::predict.MixMod
for details). The latter option
requires the response variable to be defined in the newdata
argument
of predict()
, which will be set to its typical value (see
typical_value
).
polr
, clm
models, or more generally speaking, models with
ordinal or multinominal outcomes, have an additional column
response.level
, which indicates with which level of the response
variable the predicted values are associated.
The print()
method gives a clean output (especially for predictions
by groups), and indicates at which values covariates were held constant.
Furthermore, the print()
method has the arguments digits
and
n
to control number of decimals and lines to be printed, and an
argument x.lab
to print factorlevels instead of numeric values
if x
is a factor.
The support for some models, for example from package MCMCglmm, is rather experimental and may fail for certain models. If you encounter any errors, please file an issue at https://github.com/strengejacke/ggeffects/issues.
Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zeroinflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378400.
Johnson PC, O'Hara RB. 2014. Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models. Methods Ecol Evol, 5: 944946. (doi: 10.1111/2041210X.12225 )
library(sjlabelled) data(efc) fit < lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) ggpredict(fit, terms = "c12hour")#> #> # Predicted values of Total score BARTHEL INDEX #> # x = average number of hours of care per week #> #> x predicted std.error conf.low conf.high #> 0 75.444 1.116 73.257 77.630 #> 20 70.378 0.925 68.564 72.191 #> 45 64.045 0.843 62.393 65.697 #> 65 58.979 0.930 57.157 60.802 #> 85 53.913 1.122 51.713 56.113 #> 105 48.847 1.377 46.148 51.546 #> 125 43.781 1.665 40.517 47.045 #> 170 32.382 2.373 27.732 37.033 #> #> Adjusted for: #> * neg_c_7 = 11.84 #> * c161sex = 1.76 #> * c172code = 1.97 #>#> #> # Predicted values of Total score BARTHEL INDEX #> # x = average number of hours of care per week #> #> # c172code = low level of education #> x predicted std.error conf.low conf.high #> 0 74.746 1.776 71.266 78.227 #> 30 67.147 1.587 64.037 70.257 #> 55 60.815 1.550 57.777 63.853 #> 85 53.216 1.662 49.958 56.473 #> 115 45.617 1.914 41.865 49.368 #> 170 31.685 2.593 26.602 36.768 #> #> # c172code = intermediate level of education #> x predicted std.error conf.low conf.high #> 0 75.465 1.114 73.282 77.647 #> 30 67.866 0.868 66.165 69.566 #> 55 61.533 0.872 59.824 63.243 #> 85 53.934 1.126 51.727 56.141 #> 115 46.335 1.522 43.352 49.318 #> 170 32.404 2.377 27.745 37.062 #> #> # c172code = high level of education #> x predicted std.error conf.low conf.high #> 0 76.183 1.718 72.816 79.550 #> 30 68.584 1.616 65.417 71.751 #> 55 62.252 1.656 59.006 65.497 #> 85 54.653 1.843 51.040 58.265 #> 115 47.053 2.143 42.853 51.254 #> 170 33.122 2.863 27.511 38.733 #> #> Adjusted for: #> * neg_c_7 = 11.84 #> * c161sex = 1.76 #>#> #> # Predicted values of Total score BARTHEL INDEX #> # x = average number of hours of care per week #> #> # c172code = low level of education #> # c161sex = [1] Male #> x predicted std.error conf.low conf.high #> 0 73.954 2.347 69.354 78.554 #> 45 62.556 2.208 58.228 66.883 #> 85 52.424 2.310 47.896 56.951 #> 170 30.893 3.085 24.847 36.939 #> #> # c172code = intermediate level of education #> # c161sex = [1] Male #> x predicted std.error conf.low conf.high #> 0 74.673 1.845 71.055 78.290 #> 45 63.274 1.730 59.883 66.665 #> 85 53.142 1.911 49.397 56.887 #> 170 31.611 2.872 25.982 37.241 #> #> # c172code = high level of education #> # c161sex = [1] Male #> x predicted std.error conf.low conf.high #> 0 75.391 2.220 71.040 79.741 #> 45 63.992 2.176 59.727 68.258 #> 85 53.860 2.364 49.226 58.494 #> 170 32.330 3.257 25.946 38.713 #> #> # c172code = low level of education #> # c161sex = [2] Female #> x predicted std.error conf.low conf.high #> 0 74.996 1.831 71.406 78.585 #> 45 63.597 1.603 60.456 66.738 #> 85 53.465 1.702 50.130 56.800 #> 170 31.934 2.606 26.827 37.042 #> #> # c172code = intermediate level of education #> # c161sex = [2] Female #> x predicted std.error conf.low conf.high #> 0 75.714 1.225 73.313 78.115 #> 45 64.315 0.968 62.418 66.213 #> 85 54.183 1.209 51.815 56.552 #> 170 32.653 2.403 27.943 37.362 #> #> # c172code = high level of education #> # c161sex = [2] Female #> x predicted std.error conf.low conf.high #> 0 76.432 1.809 72.887 79.977 #> 45 65.034 1.712 61.679 68.388 #> 85 54.902 1.910 51.158 58.646 #> 170 33.371 2.895 27.697 39.045 #> #> Adjusted for: #> * neg_c_7 = 11.84 #># specified as formula ggpredict(fit, terms = ~ c12hour + c172code + c161sex)#> #> # Predicted values of Total score BARTHEL INDEX #> # x = average number of hours of care per week #> #> # c172code = low level of education #> # c161sex = [1] Male #> x predicted std.error conf.low conf.high #> 0 73.954 2.347 69.354 78.554 #> 45 62.556 2.208 58.228 66.883 #> 85 52.424 2.310 47.896 56.951 #> 170 30.893 3.085 24.847 36.939 #> #> # c172code = intermediate level of education #> # c161sex = [1] Male #> x predicted std.error conf.low conf.high #> 0 74.673 1.845 71.055 78.290 #> 45 63.274 1.730 59.883 66.665 #> 85 53.142 1.911 49.397 56.887 #> 170 31.611 2.872 25.982 37.241 #> #> # c172code = high level of education #> # c161sex = [1] Male #> x predicted std.error conf.low conf.high #> 0 75.391 2.220 71.040 79.741 #> 45 63.992 2.176 59.727 68.258 #> 85 53.860 2.364 49.226 58.494 #> 170 32.330 3.257 25.946 38.713 #> #> # c172code = low level of education #> # c161sex = [2] Female #> x predicted std.error conf.low conf.high #> 0 74.996 1.831 71.406 78.585 #> 45 63.597 1.603 60.456 66.738 #> 85 53.465 1.702 50.130 56.800 #> 170 31.934 2.606 26.827 37.042 #> #> # c172code = intermediate level of education #> # c161sex = [2] Female #> x predicted std.error conf.low conf.high #> 0 75.714 1.225 73.313 78.115 #> 45 64.315 0.968 62.418 66.213 #> 85 54.183 1.209 51.815 56.552 #> 170 32.653 2.403 27.943 37.362 #> #> # c172code = high level of education #> # c161sex = [2] Female #> x predicted std.error conf.low conf.high #> 0 76.432 1.809 72.887 79.977 #> 45 65.034 1.712 61.679 68.388 #> 85 54.902 1.910 51.158 58.646 #> 170 33.371 2.895 27.697 39.045 #> #> Adjusted for: #> * neg_c_7 = 11.84 #># only range of 40 to 60 for variable 'c12hour' ggpredict(fit, terms = "c12hour [40:60]")#> #> # Predicted values of Total score BARTHEL INDEX #> # x = average number of hours of care per week #> #> x predicted std.error conf.low conf.high #> 40 65.312 0.842 63.661 66.962 #> 43 64.552 0.842 62.902 66.201 #> 45 64.045 0.843 62.393 65.697 #> 47 63.538 0.846 61.881 65.196 #> 50 62.779 0.852 61.108 64.449 #> 53 62.019 0.862 60.329 63.708 #> 55 61.512 0.870 59.806 63.218 #> 60 60.246 0.896 58.489 62.002 #> #> Adjusted for: #> * neg_c_7 = 11.84 #> * c161sex = 1.76 #> * c172code = 1.97 #># using "summary()" shows that covariate "neg_c_7" is held # constant at a value of 11.84 (its mean value). To use a # different value, use "condition" ggpredict(fit, terms = "c12hour [40:60]", condition = c(neg_c_7 = 20))#> #> # Predicted values of Total score BARTHEL INDEX #> # x = average number of hours of care per week #> #> x predicted std.error conf.low conf.high #> 40 46.562 2.030 42.582 50.541 #> 43 45.802 2.018 41.847 49.757 #> 45 45.295 2.010 41.355 49.235 #> 47 44.789 2.003 40.862 48.715 #> 50 44.029 1.994 40.121 47.936 #> 53 43.269 1.986 39.377 47.161 #> 55 42.762 1.981 38.879 46.645 #> 60 41.496 1.972 37.631 45.360 #> #> Adjusted for: #> * c161sex = 1.76 #> * c172code = 1.97 #># to plot ggeffectsobjects, you can use the 'plot()'function. # the following examples show how to build your ggplot by hand. # \donttest{ # plot predicted values, remaining covariates held constant library(ggplot2) mydf < ggpredict(fit, terms = "c12hour") ggplot(mydf, aes(x, predicted)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1)# three variables, so we can use facets and groups mydf < ggpredict(fit, terms = c("c12hour", "c161sex", "c172code")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet, ncol = 2)# select specific levels for grouping terms mydf < ggpredict(fit, terms = c("c12hour", "c172code [1,3]", "c161sex")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet) + labs( y = get_y_title(mydf), x = get_x_title(mydf), colour = get_legend_title(mydf) )# level indication also works for factors with nonnumeric levels # and in combination with numeric levels for other variables data(efc) efc$c172code < as_label(efc$c172code) fit < lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) ggpredict(fit, terms = c("c12hour", "c172code [low level of education, high level of education]", "c161sex [1]"))#> #> # Predicted values of Total score BARTHEL INDEX #> # x = average number of hours of care per week #> #> # c172code = low level of education #> x predicted std.error conf.low conf.high #> 0 72.813 2.501 67.912 77.714 #> 30 65.216 2.386 60.540 69.892 #> 55 58.886 2.375 54.231 63.540 #> 85 51.289 2.464 46.460 56.118 #> 115 43.692 2.654 38.490 48.895 #> 170 29.765 3.200 23.493 36.037 #> #> # c172code = high level of education #> x predicted std.error conf.low conf.high #> 0 74.030 2.447 69.233 78.826 #> 30 66.433 2.392 61.744 71.122 #> 55 60.102 2.432 55.336 64.869 #> 85 52.506 2.577 47.454 57.557 #> 115 44.909 2.813 39.396 50.421 #> 170 30.982 3.412 24.293 37.670 #> #> Adjusted for: #> * neg_c_7 = 11.84 #># use categorical value on xaxis, use axislabels, add error bars dat < ggpredict(fit, terms = c("c172code", "c161sex")) ggplot(dat, aes(x, predicted, colour = group)) + geom_point(position = position_dodge(.1)) + geom_errorbar( aes(ymin = conf.low, ymax = conf.high), position = position_dodge(.1) ) + scale_x_discrete(breaks = 1:3, labels = get_x_labels(dat))# 3wayinteraction with 2 continuous variables data(efc) # make categorical efc$c161sex < as_factor(efc$c161sex) fit < lm(neg_c_7 ~ c12hour * barthtot * c161sex, data = efc) # select only levels 30, 50 and 70 from continuous variable BarthelIndex dat < ggpredict(fit, terms = c("c12hour", "barthtot [30,50,70]", "c161sex")) ggplot(dat, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) + facet_wrap(~facet) + labs( colour = get_legend_title(dat), x = get_x_title(dat), y = get_y_title(dat), title = get_title(dat) )# marginal effects for polynomial terms data(efc) fit < glm( tot_sc_e ~ c12hour + e42dep + e17age + I(e17age^2) + I(e17age^3), data = efc, family = poisson() ) ggeffect(fit, terms = "e17age")#> #> # Predicted values of Services for elderly #> # x = elder' age #> #> x predicted std.error conf.low conf.high #> 64 1.369 0.140 1.040 1.801 #> 70 0.942 0.059 0.840 1.057 #> 74 0.899 0.059 0.800 1.010 #> 78 0.939 0.050 0.852 1.036 #> 84 1.042 0.052 0.941 1.153 #> 90 1.007 0.066 0.884 1.146 #> 94 0.824 0.121 0.650 1.044 #> 104 0.172 0.697 0.044 0.674 #>