# Marginal effects, adjusted predictions and estimated marginal means from regression models

Source:`R/data_frame_methods.R`

, `R/ggeffect.R`

, `R/ggemmeans.R`

, and 1 more
`ggpredict.Rd`

The **ggeffects** package computes estimated marginal means (predicted values) for the
response, at the margin of specific values or levels from certain model terms,
i.e. it generates predictions by a model by holding the non-focal variables
constant and varying the focal variable(s).

`ggpredict()`

uses `predict()`

for generating predictions,
while `ggeffect()`

computes marginal effects by internally calling
`effects::Effect()`

and `ggemmeans()`

uses `emmeans::emmeans()`

.
The result is returned as consistent data frame.

## Usage

```
# S3 method for ggeffects
as.data.frame(
x,
row.names = NULL,
optional = FALSE,
...,
stringsAsFactors = FALSE,
terms_to_colnames = FALSE
)
ggeffect(model, terms, ci.lvl = 0.95, verbose = TRUE, ...)
ggemmeans(
model,
terms,
ci.lvl = 0.95,
type = "fixed",
typical = "mean",
condition = NULL,
back.transform = TRUE,
interval = "confidence",
verbose = TRUE,
...
)
ggpredict(
model,
terms,
ci.lvl = 0.95,
type = "fixed",
typical = "mean",
condition = NULL,
back.transform = TRUE,
ppd = FALSE,
vcov.fun = NULL,
vcov.type = NULL,
vcov.args = NULL,
interval,
verbose = TRUE,
...
)
```

## Arguments

- x
An object of class

`ggeffects`

, as returned by`ggpredict()`

,`ggeffect()`

or`ggemmeans()`

.- row.names
`NULL`

or a character vector giving the row names for the data frame. Missing values are not allowed.- optional
logical. If

`TRUE`

, setting row names and converting column names (to syntactic names: see`make.names`

) is optional. Note that all of R's base package`as.data.frame()`

methods use`optional`

only for column names treatment, basically with the meaning of`data.frame(*, check.names = !optional)`

. See also the`make.names`

argument of the`matrix`

method.- ...
For

`ggpredict()`

, further arguments passed down to`predict()`

; for`ggeffect()`

, further arguments passed down to`effects::Effect()`

; and for`ggemmeans()`

, further arguments passed down to`emmeans::emmeans()`

. If`type = "sim"`

,`...`

may also be used to set the number of simulation, e.g.`nsim = 500`

.- stringsAsFactors
logical: should the character vector be converted to a factor?

- terms_to_colnames
Logical, if

`TRUE`

, standardized column names (like`"x"`

,`"group"`

or`"facet"`

) are replaced by the variable names of the focal predictors specified in`terms`

.- model
A fitted model object, or a list of model objects. Any model that supports common methods like

`predict()`

,`family()`

or`model.frame()`

should work. For`ggeffect()`

, any model that is supported by**effects**should work, and for`ggemmeans()`

, all models supported by**emmeans**should work.- terms
Character vector, (or a named list or a formula) with the names of those terms from

`model`

, for which predictions should be displayed. At least one term is required to calculate effects for certain terms, maximum length is four terms, where the second to fourth term indicate the groups, i.e. predictions of first term are grouped at the values or levels of the remaining terms. If`terms`

is missing or`NULL`

, adjusted predictions for each model term are calculated. It is also possible to define specific values for terms, at which adjusted predictions should be calculated (see 'Details'). All remaining covariates that are not specified in`terms`

are held constant (see 'Details'). See also arguments`condition`

and`typical`

.- ci.lvl
Numeric, the level of the confidence intervals. For

`ggpredict()`

, use`ci.lvl = NA`

, if confidence intervals should not be calculated (for instance, due to computation time). Typically, confidence intervals based on the standard errors as returned by the`predict()`

function are returned, assuming normal distribution (i.e.`+/- 1.96 * SE`

). See introduction of this vignette for more details.- verbose
Toggle messages or warnings.

- type
Character, only applies for survival models, mixed effects models and/or models with zero-inflation.

**Note:**For`brmsfit`

-models with zero-inflation component, there is no`type = "zero_inflated"`

nor`type = "zi_random"`

; predicted values for`MixMod`

-models from**GLMMadaptive**with zero-inflation component*always*condition on the zero-inflation part of the model (see 'Details').`"fixed"`

(or`"fe"`

or`"count"`

)Predicted values are conditioned on the fixed effects or conditional model only (for mixed models: predicted values are on the population-level and

*confidence intervals*are returned). For instance, for models fitted with`zeroinfl`

from**pscl**, this would return the predicted mean from the count component (without zero-inflation). For models with zero-inflation component, this type calls`predict(..., type = "link")`

(however, predicted values are back-transformed to the response scale).`"random"`

(or`"re"`

)This only applies to mixed models, and

`type = "random"`

does not condition on the zero-inflation component of the model.`type = "random"`

still returns population-level predictions, however, unlike`type = "fixed"`

, intervals also consider the uncertainty in the variance parameters (the mean random effect variance, see*Johnson et al. 2014*for details) and hence can be considered as*prediction intervals*. For models with zero-inflation component, this type calls`predict(..., type = "link")`

(however, predicted values are back-transformed to the response scale).To get predicted values for each level of the random effects groups, add the name of the related random effect term to the

`terms`

-argument (for more details, see this vignette).`"zero_inflated"`

(or`"fe.zi"`

or`"zi"`

)Predicted values are conditioned on the fixed effects and the zero-inflation component. For instance, for models fitted with

`zeroinfl`

from**pscl**, this would return the predicted response (`mu*(1-p)`

) and for**glmmTMB**, this would return the expected value`mu*(1-p)`

*without*conditioning on random effects (i.e. random effect variances are not taken into account for the confidence intervals). For models with zero-inflation component, this type calls`predict(..., type = "response")`

. See 'Details'.`"zi_random"`

(or`"re.zi"`

or`"zero_inflated_random"`

)Predicted values are conditioned on the zero-inflation component and take the random effects uncertainty into account. For models fitted with

`glmmTMB()`

,`hurdle()`

or`zeroinfl()`

, this would return the expected value`mu*(1-p)`

. For**glmmTMB**, prediction intervals also consider the uncertainty in the random effects variances. This type calls`predict(..., type = "response")`

. See 'Details'.`"zi_prob"`

(or`"zi.prob"`

)Predicted zero-inflation probability. For

**glmmTMB**models with zero-inflation component, this type calls`predict(..., type = "zlink")`

; models from**pscl**call`predict(..., type = "zero")`

and for**GLMMadaptive**,`predict(..., type = "zero_part")`

is called.`"simulate"`

(or`"sim"`

)Predicted values and confidence resp. prediction intervals are based on simulations, i.e. calls to

`simulate()`

. This type of prediction takes all model uncertainty into account, including random effects variances. Currently supported models are objects of class`lm`

,`glm`

,`glmmTMB`

,`wbm`

,`MixMod`

and`merMod`

. See`...`

for details on number of simulations.`"survival"`

and`"cumulative_hazard"`

(or`"surv"`

and`"cumhaz"`

)Applies only to

`coxph`

-objects from the**survial**-package and calculates the survival probability or the cumulative hazard of an event.

- typical
Character vector, naming the function to be applied to the covariates over which the effect is "averaged". The default is "mean". See

`?sjmisc::typical_value`

for options.- condition
Named character vector, which indicates covariates that should be held constant at specific values. Unlike

`typical`

, which applies a function to the covariates to determine the value that is used to hold these covariates constant,`condition`

can be used to define exact values, for instance`condition = c(covariate1 = 20, covariate2 = 5)`

. See 'Examples'.- back.transform
Logical, if

`TRUE`

(the default), predicted values for log- or log-log transformed responses will be back-transformed to original response-scale.- interval
Type of interval calculation, can either be

`"confidence"`

(default) or`"prediction"`

. May be abbreviated. Unlike*confidence intervals*,*prediction intervals*include the residual variance (sigma^2). For mixed models,`interval = "prediction"`

is the default for`type = "random"`

. When`type = "fixed"`

, the default is`interval = "confidence"`

. Note that prediction intervals are not available for all models, but only for models that work with`insight::get_sigma()`

.- ppd
Logical, if

`TRUE`

, predictions for Stan-models are based on the posterior predictive distribution`rstantools::posterior_predict()`

. If`FALSE`

(the default), predictions are based on posterior draws of the linear predictor`rstantools::posterior_linpred()`

.- vcov.fun
Variance-covariance matrix used to compute uncertainty estimates (e.g., for confidence intervals based on robust standard errors). This argument accepts a covariance matrix, a function which returns a covariance matrix, or a string which identifies the function to be used to compute the covariance matrix.

A covariance matrix

A function which returns a covariance matrix (e.g.,

`stats::vcov()`

)A string which indicates the name of the

`vcov*()`

-function from the**sandwich**or**clubSandwich**-package, e.g.`vcov.fun = "vcovCL"`

, which is used to compute (cluster) robust standard errors for predictions. If`NULL`

, standard errors (and confidence intervals) for predictions are based on the standard errors as returned by the`predict()`

-function.**Note**that probably not all model objects that work with`ggpredict()`

are also supported by the**sandwich**or**clubSandwich**-package.

- vcov.type
Character vector, specifying the estimation type for the robust covariance matrix estimation (see

`?sandwich::vcovHC`

or`?clubSandwich::vcovCR`

for details). Only used when`vcov.fun`

is a character string.- vcov.args
List of named vectors, used as additional arguments that are passed down to

`vcov.fun`

.

## Value

A data frame (with `ggeffects`

class attribute) with consistent data columns:

`"x"`

: the values of the first term in`terms`

, used as x-position in plots.`"predicted"`

: the predicted values of the response, used as y-position in plots.`"std.error"`

: the standard error of the predictions.*Note that the standard errors are always on the link-scale, and not back-transformed for non-Gaussian models!*`"conf.low"`

: the lower bound of the confidence interval for the predicted values.`"conf.high"`

: the upper bound of the confidence interval for the predicted values.`"group"`

: the grouping level from the second term in`terms`

, used as grouping-aesthetics in plots.`"facet"`

: the grouping level from the third term in`terms`

, used to indicate facets in plots.The estimated marginal means (or predicted values) are always on the response scale!

For proportional odds logistic regression (see

`?MASS::polr`

) resp. cumulative link models (e.g., see`?ordinal::clm`

), an additional column`"response.level"`

is returned, which indicates the grouping of predictions based on the level of the model's response.Note that for convenience reasons, the columns for the intervals are always named

`"conf.low"`

and`"conf.high"`

, even though for Bayesian models credible or highest posterior density intervals are returned.

## Details

**Supported Models**

A list of supported models can be found at https://github.com/strengejacke/ggeffects.
Support for models varies by function, i.e. although `ggpredict()`

,
`ggemmeans()`

and `ggeffect()`

support most models, some models
are only supported exclusively by one of the three functions.

**Difference between ggpredict() and ggeffect() or ggemmeans()**

`ggpredict()`

calls `predict()`

, while `ggeffect()`

calls `effects::Effect()`

and `ggemmeans()`

calls `emmeans::emmeans()`

to compute predicted values.
Thus, effects returned by `ggpredict()`

can be described as *conditional effects*
(i.e. these are conditioned on certain (reference) levels of factors), while
`ggemmeans()`

and `ggeffect()`

return *marginal means*, since
the effects are "marginalized" (or "averaged") over the levels of factors
(or values of character vectors). Therefore, `ggpredict()`

and `ggeffect()`

resp. `ggemmeans()`

differ in how factors and character vectors are held
constant: `ggpredict()`

uses the reference level (or "lowest" value in case
of character vectors), while `ggeffect()`

and `ggemmeans()`

compute a
kind of "average" value, which represents the proportions of each factor's
category. Use `condition`

to set a specific level for factors in
`ggemmeans()`

, so factors are not averaged over their categories,
but held constant at a given level.

**Marginal Effects and Adjusted Predictions at Specific Values**

Specific values of model terms can be specified via the `terms`

-argument.
Indicating levels in square brackets allows for selecting only
specific groups or values resp. value ranges. Term name and the start of
the levels in brackets must be separated by a whitespace character, e.g.
`terms = c("age", "education [1,3]")`

. Numeric ranges, separated
with colon, are also allowed: `terms = c("education", "age [30:60]")`

.
The stepsize for range can be adjusted using `by`

, e.g.
`terms = "age [30:60 by=5]"`

.

The `terms`

-argument also supports the same shortcuts as the
`values`

-argument in `values_at()`

. So
`terms = "age [meansd]"`

would return predictions for the values
one standard deviation below the mean age, the mean age and
one SD above the mean age. `terms = "age [quart2]"`

would calculate
predictions at the value of the lower, median and upper quartile of age.

Furthermore, it is possible to specify a function name. Values for
predictions will then be transformed, e.g. `terms = "income [exp]"`

.
This is useful when model predictors were transformed for fitting the
model and should be back-transformed to the original scale for predictions.
It is also possible to define own functions (see
this vignette).

Instead of a function, it is also possible to define the name of a variable
with specific values, e.g. to define a vector `v = c(1000, 2000, 3000)`

and
then use `terms = "income [v]"`

.

You can take a random sample of any size with `sample=n`

, e.g
`terms = "income [sample=8]"`

, which will sample eight values from
all possible values of the variable `income`

. This option is especially
useful for plotting predictions at certain levels of random effects
group levels, where the group factor has many levels that can be completely
plotted. For more details, see
this vignette.

Finally, numeric vectors for which no specific values are given, a "pretty range"
is calculated (see `pretty_range()`

), to avoid memory allocation problems
for vectors with many unique values. If a numeric vector is specified as
second or third term (i.e. if this vector represents a grouping structure),
representative values (see `values_at()`

) are chosen (unless other values
are specified). If all values for a numeric vector should be used to compute
predictions, you may use e.g. `terms = "age [all]"`

. See also package vignettes.

To create a pretty range that should be smaller or larger than the default
range (i.e. if no specific values would be given), use the `n`

-tag, e.g.
`terms="age [n=5]"`

or `terms="age [n=12]"`

. Larger values for `n`

return a
larger range of predicted values.

**Holding covariates at constant values**

For `ggpredict()`

, `expand.grid()`

is called on all unique
combinations of `model.frame(model)[, terms]`

and used as
`newdata`

-argument for `predict()`

. In this case,
all remaining covariates that are not specified in `terms`

are
held constant: Numeric values are set to the mean (unless changed with
the `condition`

or `typical`

-argument), integer values are set to their
median, factors are set to their reference level (may also be changed with
`condition`

) and character vectors to their mode (most common element).

`ggeffect()`

and `ggemmeans()`

, by default, set remaining numeric
covariates to their mean value, while for factors, a kind of "average" value,
which represents the proportions of each factor's category, is used. The
same applies to character vectors: `ggemmeans()`

averages over the distribution
of unique values in a character vector, similar to how factors are treated.
For `ggemmeans()`

, use `condition`

to set a specific level for
factors so that these are not averaged over their categories, but held
constant at the given level.

**Bayesian Regression Models**

`ggpredict()`

also works with **Stan**-models from
the **rstanarm** or **brms**-packages. The predicted
values are the median value of all drawn posterior samples. The
confidence intervals for Stan-models are Bayesian predictive intervals.
By default (i.e. `ppd = FALSE`

), the predictions are based on
`rstantools::posterior_linpred()`

and hence have some
limitations: the uncertainty of the error term is not taken into
account. The recommendation is to use the posterior predictive
distribution (`rstantools::posterior_predict()`

).

**Zero-Inflated and Zero-Inflated Mixed Models with brms**

Models of class `brmsfit`

always condition on the zero-inflation
component, if the model has such a component. Hence, there is no
`type = "zero_inflated"`

nor `type = "zi_random"`

for `brmsfit`

-models,
because predictions are based on draws of the posterior distribution,
which already account for the zero-inflation part of the model.

**Zero-Inflated and Zero-Inflated Mixed Models with glmmTMB**

If `model`

is of class `glmmTMB`

, `hurdle`

, `zeroinfl`

or `zerotrunc`

, simulations from a multivariate normal distribution
(see `?MASS::mvrnorm`

) are drawn to calculate `mu*(1-p)`

.
Confidence intervals are then based on quantiles of these results. For
`type = "zi_random"`

, prediction intervals also take the uncertainty in
the random-effect paramters into account (see also Brooks et al. 2017,
pp.391-392 for details).

An alternative for models fitted with **glmmTMB** that take all model
uncertainties into account are simulations based on `simulate()`

, which
is used when `type = "sim"`

(see Brooks et al. 2017, pp.392-393 for
details).

**MixMod-models from GLMMadaptive**

Predicted values for the fixed effects component (`type = "fixed"`

or
`type = "zero_inflated"`

) are based on `predict(..., type = "mean_subject")`

,
while predicted values for random effects components (`type = "random"`

or
`type = "zi_random"`

) are calculated with `predict(..., type = "subject_specific")`

(see `?GLMMadaptive::predict.MixMod`

for details). The latter option
requires the response variable to be defined in the `newdata`

-argument
of `predict()`

, which will be set to its typical value (see
`?sjmisc::typical_value`

).

## Note

**Multinomial Models**

`polr`

-, `clm`

-models, or more generally speaking, models with ordinal or
multinominal outcomes, have an additional column `response.level`

, which
indicates with which level of the response variable the predicted values are
associated.

**Printing Results**

The `print()`

-method gives a clean output (especially for predictions by
groups), and indicates at which values covariates were held constant.
Furthermore, the `print()`

-method has the arguments `digits`

and `n`

to
control number of decimals and lines to be printed, and an argument `x.lab`

to print factor-levels instead of numeric values if `x`

is a factor.

**Limitations**

The support for some models, for example from package **MCMCglmm**, is
rather experimental and may fail for certain models. If you encounter
any errors, please file an issue at https://github.com/strengejacke/ggeffects/issues.

## References

Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378-400.

Johnson PC, O'Hara RB. 2014. Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946.

## Examples

```
library(sjlabelled)
#>
#> Attaching package: ‘sjlabelled’
#> The following object is masked from ‘package:ggplot2’:
#>
#> as_label
data(efc)
fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
ggpredict(fit, terms = "c12hour")
#> # Predicted values of Total score BARTHEL INDEX
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 75.44 | [73.25, 77.63]
#> 20 | 70.38 | [68.56, 72.19]
#> 45 | 64.05 | [62.39, 65.70]
#> 65 | 58.98 | [57.15, 60.80]
#> 85 | 53.91 | [51.71, 56.12]
#> 105 | 48.85 | [46.14, 51.55]
#> 125 | 43.78 | [40.51, 47.05]
#> 170 | 32.38 | [27.73, 37.04]
#>
#> Adjusted for:
#> * neg_c_7 = 11.84
#> * c161sex = 1.76
#> * c172code = 1.97
#>
#> Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
#> all rows.
ggpredict(fit, terms = c("c12hour", "c172code"))
#> # Predicted values of Total score BARTHEL INDEX
#>
#> # c172code = low level of education
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 74.75 | [71.26, 78.23]
#> 30 | 67.15 | [64.03, 70.26]
#> 55 | 60.81 | [57.77, 63.86]
#> 85 | 53.22 | [49.95, 56.48]
#> 115 | 45.62 | [41.86, 49.37]
#> 170 | 31.69 | [26.59, 36.78]
#>
#> # c172code = intermediate level of education
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 75.46 | [73.28, 77.65]
#> 30 | 67.87 | [66.16, 69.57]
#> 55 | 61.53 | [59.82, 63.25]
#> 85 | 53.93 | [51.72, 56.14]
#> 115 | 46.34 | [43.35, 49.32]
#> 170 | 32.40 | [27.74, 37.07]
#>
#> # c172code = high level of education
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 76.18 | [72.81, 79.55]
#> 30 | 68.58 | [65.41, 71.76]
#> 55 | 62.25 | [59.00, 65.50]
#> 85 | 54.65 | [51.03, 58.27]
#> 115 | 47.05 | [42.85, 51.26]
#> 170 | 33.12 | [27.50, 38.74]
#>
#> Adjusted for:
#> * neg_c_7 = 11.84
#> * c161sex = 1.76
#>
#> Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
#> all rows.
ggpredict(fit, terms = c("c12hour", "c172code", "c161sex"))
#> # Predicted values of Total score BARTHEL INDEX
#>
#> # c172code = low level of education
#> # c161sex = [1] Male
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 73.95 | [69.35, 78.56]
#> 45 | 62.56 | [58.22, 66.89]
#> 85 | 52.42 | [47.89, 56.96]
#> 170 | 30.89 | [24.84, 36.95]
#>
#> # c172code = intermediate level of education
#> # c161sex = [1] Male
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 74.67 | [71.05, 78.29]
#> 45 | 63.27 | [59.88, 66.67]
#> 85 | 53.14 | [49.39, 56.89]
#> 170 | 31.61 | [25.97, 37.25]
#>
#> # c172code = high level of education
#> # c161sex = [1] Male
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 75.39 | [71.03, 79.75]
#> 45 | 63.99 | [59.72, 68.26]
#> 85 | 53.86 | [49.22, 58.50]
#> 170 | 32.33 | [25.94, 38.72]
#>
#> # c172code = low level of education
#> # c161sex = [2] Female
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 75.00 | [71.40, 78.59]
#> 45 | 63.60 | [60.45, 66.74]
#> 85 | 53.46 | [50.12, 56.80]
#> 170 | 31.93 | [26.82, 37.05]
#>
#> # c172code = intermediate level of education
#> # c161sex = [2] Female
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 75.71 | [73.31, 78.12]
#> 45 | 64.32 | [62.41, 66.22]
#> 85 | 54.18 | [51.81, 56.56]
#> 170 | 32.65 | [27.94, 37.37]
#>
#> # c172code = high level of education
#> # c161sex = [2] Female
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 76.43 | [72.88, 79.98]
#> 45 | 65.03 | [61.67, 68.39]
#> 85 | 54.90 | [51.15, 58.65]
#> 170 | 33.37 | [27.69, 39.05]
#>
#> Adjusted for:
#> * neg_c_7 = 11.84
#>
#> Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
#> all rows.
# specified as formula
ggpredict(fit, terms = ~ c12hour + c172code + c161sex)
#> # Predicted values of Total score BARTHEL INDEX
#>
#> # c172code = low level of education
#> # c161sex = [1] Male
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 73.95 | [69.35, 78.56]
#> 45 | 62.56 | [58.22, 66.89]
#> 85 | 52.42 | [47.89, 56.96]
#> 170 | 30.89 | [24.84, 36.95]
#>
#> # c172code = intermediate level of education
#> # c161sex = [1] Male
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 74.67 | [71.05, 78.29]
#> 45 | 63.27 | [59.88, 66.67]
#> 85 | 53.14 | [49.39, 56.89]
#> 170 | 31.61 | [25.97, 37.25]
#>
#> # c172code = high level of education
#> # c161sex = [1] Male
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 75.39 | [71.03, 79.75]
#> 45 | 63.99 | [59.72, 68.26]
#> 85 | 53.86 | [49.22, 58.50]
#> 170 | 32.33 | [25.94, 38.72]
#>
#> # c172code = low level of education
#> # c161sex = [2] Female
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 75.00 | [71.40, 78.59]
#> 45 | 63.60 | [60.45, 66.74]
#> 85 | 53.46 | [50.12, 56.80]
#> 170 | 31.93 | [26.82, 37.05]
#>
#> # c172code = intermediate level of education
#> # c161sex = [2] Female
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 75.71 | [73.31, 78.12]
#> 45 | 64.32 | [62.41, 66.22]
#> 85 | 54.18 | [51.81, 56.56]
#> 170 | 32.65 | [27.94, 37.37]
#>
#> # c172code = high level of education
#> # c161sex = [2] Female
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 0 | 76.43 | [72.88, 79.98]
#> 45 | 65.03 | [61.67, 68.39]
#> 85 | 54.90 | [51.15, 58.65]
#> 170 | 33.37 | [27.69, 39.05]
#>
#> Adjusted for:
#> * neg_c_7 = 11.84
#>
#> Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
#> all rows.
# only range of 40 to 60 for variable 'c12hour'
ggpredict(fit, terms = "c12hour [40:60]")
#> # Predicted values of Total score BARTHEL INDEX
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 40 | 65.31 | [63.66, 66.96]
#> 43 | 64.55 | [62.90, 66.20]
#> 45 | 64.05 | [62.39, 65.70]
#> 47 | 63.54 | [61.88, 65.20]
#> 50 | 62.78 | [61.11, 64.45]
#> 53 | 62.02 | [60.33, 63.71]
#> 55 | 61.51 | [59.80, 63.22]
#> 60 | 60.25 | [58.49, 62.01]
#>
#> Adjusted for:
#> * neg_c_7 = 11.84
#> * c161sex = 1.76
#> * c172code = 1.97
#>
#> Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
#> all rows.
# terms as named list
ggpredict(fit, terms = list(c12hour = 40:60))
#> # Predicted values of Total score BARTHEL INDEX
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 40 | 65.31 | [63.66, 66.96]
#> 43 | 64.55 | [62.90, 66.20]
#> 45 | 64.05 | [62.39, 65.70]
#> 47 | 63.54 | [61.88, 65.20]
#> 50 | 62.78 | [61.11, 64.45]
#> 53 | 62.02 | [60.33, 63.71]
#> 55 | 61.51 | [59.80, 63.22]
#> 60 | 60.25 | [58.49, 62.01]
#>
#> Adjusted for:
#> * neg_c_7 = 11.84
#> * c161sex = 1.76
#> * c172code = 1.97
#>
#> Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
#> all rows.
# covariate "neg_c_7" is held constant at a value of 11.84 (its mean value).
# To use a different value, use "condition"
ggpredict(fit, terms = "c12hour [40:60]", condition = c(neg_c_7 = 20))
#> # Predicted values of Total score BARTHEL INDEX
#>
#> c12hour | Predicted | 95% CI
#> ------------------------------------
#> 40 | 46.56 | [42.58, 50.55]
#> 43 | 45.80 | [41.84, 49.76]
#> 45 | 45.30 | [41.35, 49.24]
#> 47 | 44.79 | [40.86, 48.72]
#> 50 | 44.03 | [40.11, 47.94]
#> 53 | 43.27 | [39.37, 47.17]
#> 55 | 42.76 | [38.87, 46.65]
#> 60 | 41.50 | [37.62, 45.37]
#>
#> Adjusted for:
#> * c161sex = 1.76
#> * c172code = 1.97
#>
#> Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
#> all rows.
# to plot ggeffects-objects, you can use the 'plot()'-function.
# the following examples show how to build your ggplot by hand.
if (FALSE) {
# plot predicted values, remaining covariates held constant
library(ggplot2)
mydf <- ggpredict(fit, terms = "c12hour")
ggplot(mydf, aes(x, predicted)) +
geom_line() +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1)
# three variables, so we can use facets and groups
mydf <- ggpredict(fit, terms = c("c12hour", "c161sex", "c172code"))
ggplot(mydf, aes(x = x, y = predicted, colour = group)) +
stat_smooth(method = "lm", se = FALSE) +
facet_wrap(~facet, ncol = 2)
# select specific levels for grouping terms
mydf <- ggpredict(fit, terms = c("c12hour", "c172code [1,3]", "c161sex"))
ggplot(mydf, aes(x = x, y = predicted, colour = group)) +
stat_smooth(method = "lm", se = FALSE) +
facet_wrap(~facet) +
labs(
y = get_y_title(mydf),
x = get_x_title(mydf),
colour = get_legend_title(mydf)
)
# level indication also works for factors with non-numeric levels
# and in combination with numeric levels for other variables
data(efc)
efc$c172code <- sjlabelled::as_label(efc$c172code)
fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
ggpredict(fit, terms = c("c12hour",
"c172code [low level of education, high level of education]",
"c161sex [1]"))
# when "terms" is a named list
ggpredict(fit, terms = list(
c12hour = seq(0, 170, 30),
c172code = c("low level of education", "high level of education"),
c161sex = 1)
)
# use categorical value on x-axis, use axis-labels, add error bars
dat <- ggpredict(fit, terms = c("c172code", "c161sex"))
ggplot(dat, aes(x, predicted, colour = group)) +
geom_point(position = position_dodge(.1)) +
geom_errorbar(
aes(ymin = conf.low, ymax = conf.high),
position = position_dodge(.1)
) +
scale_x_discrete(breaks = 1:3, labels = get_x_labels(dat))
# 3-way-interaction with 2 continuous variables
data(efc)
# make categorical
efc$c161sex <- as_factor(efc$c161sex)
fit <- lm(neg_c_7 ~ c12hour * barthtot * c161sex, data = efc)
# select only levels 30, 50 and 70 from continuous variable Barthel-Index
dat <- ggpredict(fit, terms = c("c12hour", "barthtot [30,50,70]", "c161sex"))
ggplot(dat, aes(x = x, y = predicted, colour = group)) +
stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
facet_wrap(~facet) +
labs(
colour = get_legend_title(dat),
x = get_x_title(dat),
y = get_y_title(dat),
title = get_title(dat)
)
# or with ggeffects' plot-method
plot(dat, ci = FALSE)}
# predictions for polynomial terms
data(efc)
fit <- glm(
tot_sc_e ~ c12hour + e42dep + e17age + I(e17age^2) + I(e17age^3),
data = efc,
family = poisson()
)
ggeffect(fit, terms = "e17age")
#> # Predicted counts of Services for elderly
#>
#> e17age | Predicted | 95% CI
#> ---------------------------------
#> 64 | 1.37 | [1.04, 1.80]
#> 70 | 0.94 | [0.84, 1.06]
#> 74 | 0.90 | [0.80, 1.01]
#> 78 | 0.94 | [0.85, 1.04]
#> 84 | 1.04 | [0.94, 1.15]
#> 90 | 1.01 | [0.88, 1.15]
#> 94 | 0.82 | [0.65, 1.04]
#> 104 | 0.17 | [0.04, 0.67]
#>
#> Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
#> all rows.
```