Skip to contents

Function to test differences of adjusted predictions for statistical significance. This is usually called contrasts or (pairwise) comparisons, or "marginal effects". hypothesis_test() is an alias.

Usage

test_predictions(object, ...)

hypothesis_test(object, ...)

# Default S3 method
test_predictions(
  object,
  terms = NULL,
  by = NULL,
  test = "pairwise",
  test_args = NULL,
  equivalence = NULL,
  scale = "response",
  p_adjust = NULL,
  df = NULL,
  ci_level = 0.95,
  collapse_levels = FALSE,
  margin = "mean_reference",
  condition = NULL,
  engine = "marginaleffects",
  verbose = TRUE,
  ...
)

# S3 method for class 'ggeffects'
test_predictions(
  object,
  by = NULL,
  test = "pairwise",
  equivalence = NULL,
  scale = "response",
  p_adjust = NULL,
  df = NULL,
  collapse_levels = FALSE,
  engine = "marginaleffects",
  verbose = TRUE,
  ...
)

Arguments

object

A fitted model object, or an object of class ggeffects. If object is of class ggeffects, arguments terms, margin and ci_level are taken from the ggeffects object and don't need to be specified.

...

Arguments passed down to data_grid() when creating the reference grid and to marginaleffects::predictions() resp. marginaleffects::slopes(). For instance, arguments type or transform can be used to back-transform comparisons and contrasts to different scales. vcov can be used to calculate heteroscedasticity-consistent standard errors for contrasts. See examples at the bottom of this vignette for further details.

To define a heteroscedasticity-consistent variance-covariance matrix, you can either use the same arguments as for predict_response() etc., namely vcov and vcov_args. These are then transformed into a matrix and passed down to the vcov argument in marginaleffects. Or you directly use the vcov argument. See ?marginaleffects::slopes for further details.

terms

If object is an object of class ggeffects, the same terms argument is used as for the predictions, i.e. terms can be ignored. Else, if object is a model object, terms must be a character vector with the names of the focal terms from object, for which contrasts or comparisons should be displayed. At least one term is required, maximum length is three terms. If the first focal term is numeric, contrasts or comparisons for the slopes of this numeric predictor are computed (possibly grouped by the levels of further categorical focal predictors).

by

Character vector specifying the names of predictors to condition on. Hypothesis test is then carried out for focal terms by each level of by variables. This is useful especially for interaction terms, where we want to test the interaction within "groups". by is only relevant for categorical predictors.

test

Hypothesis to test, defined as character string. Can be one of:

  • "pairwise" (default), to test pairwise comparisons.

  • "trend" (or "slope") to test for the linear trend/slope of (usually) continuous predictors. These options are just aliases for setting trend = NULL.

  • "contrast" to test simple contrasts (i.e. each level is tested against the average over all levels).

  • "exclude" to test simple contrasts (i.e. each level is tested against the average over all other levels, excluding the contrast that is being tested).

  • "interaction" to test interaction contrasts (difference-in-difference contrasts). More flexible interaction contrasts can be calcualted using the test_args argument.

  • "consecutive" to test contrasts between consecutive levels of a predictor.

  • "polynomial" to test orthogonal polynomial contrasts, assuming equally-spaced factor levels.

  • A character string with a custom hypothesis, e.g. "b2 = b1". This would test if the second level of a predictor is different from the first level. Custom hypotheses are very flexible. It is also possible to test interaction contrasts (difference-in-difference contrasts) with custom hypotheses, e.g. "(b2 - b1) = (b4 - b3)". See also section Introduction into contrasts and pairwise comparisons.

  • A data frame with custom contrasts. See 'Examples'.

  • NULL, in which case simple contrasts are computed.

Technical details about the packages used as back-end to calculate contrasts and pairwise comparisons are provided in the section Packages used as back-end to calculate contrasts and pairwise comparisons below.

test_args

Optional arguments passed to test, typically provided as named list. Only applies to those options that use the emmeans package as backend, e.g. if test = "interaction", test_args will be passed to emmeans::contrast(interaction = test_args). For other emmeans options (like "cotrast", "exclude", "consecutive" and so on), test_args will be passed to the option argument in emmeans::contrast().

equivalence

ROPE's lower and higher bounds. Should be "default" or a vector of length two (e.g., c(-0.1, 0.1)). If "default", bayestestR::rope_range() is used. Instead of using the equivalence argument, it is also possible to call the equivalence_test() method directly. This requires the parameters package to be loaded. When using equivalence_test(), two more columns with information about the ROPE coverage and decision on H0 are added. Furthermore, it is possible to plot() the results from equivalence_test(). See bayestestR::equivalence_test() resp. parameters::equivalence_test.lm() for details.

scale

Character string, indicating the scale on which the contrasts or comparisons are represented. Can be one of:

  • "response" (default), which would return contrasts on the response scale (e.g. for logistic regression, as probabilities);

  • "link" to return contrasts on scale of the linear predictors (e.g. for logistic regression, as log-odds);

  • "probability" (or "probs") returns contrasts on the probability scale, which is required for some model classes, like MASS::polr();

  • "oddsratios" to return contrasts on the odds ratio scale (only applies to logistic regression models);

  • "irr" to return contrasts on the odds ratio scale (only applies to count models);

  • or a transformation function like "exp" or "log", to return transformed (exponentiated respectively logarithmic) contrasts; note that these transformations are applied to the response scale.

Note: If the scale argument is not supported by the provided object, it is automatically changed to a supported scale-type (a message is printed when verbose = TRUE).

p_adjust

Character vector, if not NULL, indicates the method to adjust p-values. See stats::p.adjust() or stats::p.adjust.methods for details. Further possible adjustment methods are "tukey" or "sidak", and for johnson_neyman(), "fdr" (or "bh") and "esarey" (or its short-cut "es") are available options. Some caution is necessary when adjusting p-value for multiple comparisons. See also section P-value adjustment below.

df

Degrees of freedom that will be used to compute the p-values and confidence intervals. If NULL, degrees of freedom will be extracted from the model using insight::get_df() with type = "wald".

ci_level

Numeric, the level of the confidence intervals. If object is an object of class ggeffects, the same ci_level argument is used as for the predictions, i.e. ci_level can be ignored.

collapse_levels

Logical, if TRUE, term labels that refer to identical levels are no longer separated by "-", but instead collapsed into a unique term label (e.g., "level a-level a" becomes "level a"). See 'Examples'.

margin

Character string, indicates the method how to marginalize over non-focal terms. See predict_response() for details. If object is an object of class ggeffects, the same margin argument is used as for the predictions, i.e. margin can be ignored.

condition

Named character vector, which indicates covariates that should be held constant at specific values, for instance condition = c(covariate1 = 20, covariate2 = 5).

engine

Character string, indicates the package to use for computing contrasts and comparisons. Usually, this argument can be ignored, unless you want to explicitly use another package than marginaleffects to calculate contrasts and pairwise comparisons. engine can be either "marginaleffects" (default) or "emmeans". The latter is useful when the marginaleffects package is not available, or when the emmeans package is preferred. Note that using emmeans as back-end is currently not as feature rich as the default (marginaleffects) and still in development. Setting engine = "emmeans" provides some additional test options: "interaction" to calculate interaction contrasts, "consecutive" to calculate contrasts between consecutive levels of a predictor, or a data frame with custom contrasts (see also test). There is an experimental option as well, engine = "ggeffects". However, this is currently work-in-progress and offers much less options as the default engine, "marginaleffects". It can be faster in some cases, though, and works for comparing predicted random effects in mixed models, or predicted probabilities of the zero-inflation component. If the marginaleffects package is not installed, the emmeans package is used automatically. If this package is not installed as well, engine = "ggeffects" is used.

verbose

Toggle messages and warnings.

Value

A data frame containing predictions (e.g. for test = NULL), contrasts or pairwise comparisons of adjusted predictions or estimated marginal means.

Introduction into contrasts and pairwise comparisons

There are many ways to test contrasts or pairwise comparisons. A detailed introduction with many (visual) examples is shown in this vignette.

Simple workflow for pairwise comparisons

A simple workflow includes calculating adjusted predictions and passing the results directly to test_predictions(), e.g.:

# 1. fit your model
model <- lm(mpg ~ hp + wt + am, data = mtcars)
# 2. calculate adjusted predictions
pr <- predict_response(model, "am")
pr
# 3. test pairwise comparisons
test_predictions(pr)

See also this vignette.

Packages used as back-end to calculate contrasts and pairwise comparisons

The test argument is used to define which kind of contrast or comparison should be calculated. The default is to use the marginaleffects package. Here are some technical details about the packages used as back-end. When test is...

  • "pairwise" (default), pairwise comparisons are based on the marginaleffects package.

  • "trend" or "slope" also uses the marginaleffects package.

  • "contrast" uses the emmeans package, i.e. emmeans::contrast(method = "eff") is called.

  • "exclude" relies on the emmeans package, i.e. emmeans::contrast(method = "del.eff") is called.

  • "polynomial" relies on the emmeans package, i.e. emmeans::contrast(method = "poly") is called.

  • "interaction" uses the emmeans package, i.e. emmeans::contrast(interaction = ...) is called.

  • "consecutive" also relies on the emmeans package, i.e. emmeans::contrast(method = "consec") is called.

  • a character string with a custom hypothesis, the marginaleffects package is used.

  • a data frame with custom contrasts, emmeans is used again.

  • NULL calls functions from the marginaleffects package with hypothesis = NULL.

  • If all focal terms are only present as random effects in a mixed model, or if predicted probabilities for the zero-inflation component of a model should be tested, functions from the ggeffects package are used. There is an example for pairwise comparisons of random effects in this vignette.

P-value adjustment for multiple comparisons

Note that p-value adjustment for methods supported by p.adjust() (see also p.adjust.methods), each row is considered as one set of comparisons, no matter which test was specified. That is, for instance, when test_predictions() returns eight rows of predictions (when test = NULL), and p_adjust = "bonferroni", the p-values are adjusted in the same way as if we had a test of pairwise comparisons (test = "pairwise") where eight rows of comparisons are returned. For methods "tukey" or "sidak", a rank adjustment is done based on the number of combinations of levels from the focal predictors in terms. Thus, the latter two methods may be useful for certain tests only, in particular pairwise comparisons.

For johnson_neyman(), the only available adjustment methods are "fdr" (or "bh") (Benjamini & Hochberg (1995)) and "esarey" (or "es") (Esarey and Sumner 2017). These usually return similar results. The major difference is that "fdr" can be slightly faster and more stable in edge cases, however, confidence intervals are not updated. Only the p-values are adjusted. "esarey" is slower, but confidence intervals are updated as well.

Global options to choose package for calculating comparisons

ggeffects_test_engine can be used as option to either use the marginaleffects package for computing contrasts and comparisons (default), or the emmeans package (e.g. options(ggeffects_test_engine = "emmeans")). The latter is useful when the marginaleffects package is not available, or when the emmeans package is preferred. You can also provide the engine directly, e.g. test_predictions(..., engine = "emmeans"). Note that using emmeans as backend is currently not as feature rich as the default (marginaleffects) and still in development.

If engine = "emmeans", the test argument can also be "interaction" to calculate interaction contrasts (difference-in-difference contrasts), "consecutive" to calculate contrasts between consecutive levels of a predictor, or a data frame with custom contrasts. If test is one of the latter options, and engine is not specified, the engine is automatically set to "emmeans". Additionally, the test_args argument can be used to specify further options for those contrasts. See 'Examples' and documentation of test_args.

If the marginaleffects package is not installed, the emmeans package is used automatically. If this package is not installed as well, engine = "ggeffects" is used.

Global Options to Customize Tables when Printing

The verbose argument can be used to display or silence messages and warnings. Furthermore, options() can be used to set defaults for the print() and print_html() method. The following options are available, which can simply be run in the console:

  • ggeffects_ci_brackets: Define a character vector of length two, indicating the opening and closing parentheses that encompass the confidence intervals values, e.g. options(ggeffects_ci_brackets = c("[", "]")).

  • ggeffects_collapse_ci: Logical, if TRUE, the columns with predicted values (or contrasts) and confidence intervals are collapsed into one column, e.g. options(ggeffects_collapse_ci = TRUE).

  • ggeffects_collapse_p: Logical, if TRUE, the columns with predicted values (or contrasts) and p-values are collapsed into one column, e.g. options(ggeffects_collapse_p = TRUE). Note that p-values are replaced by asterisk-symbols (stars) or empty strings when ggeffects_collapse_p = TRUE, depending on the significance level.

  • ggeffects_collapse_tables: Logical, if TRUE, multiple tables for subgroups are combined into one table. Only works when there is more than one focal term, e.g. options(ggeffects_collapse_tables = TRUE).

  • ggeffects_output_format: String, either "text", "markdown" or "html". Defines the default output format from predict_response(). If "html", a formatted HTML table is created and printed to the view pane. "markdown" creates a markdown-formatted table inside Rmarkdown documents, and prints a text-format table to the console when used interactively. If "text" or NULL, a formatted table is printed to the console, e.g. options(ggeffects_output_format = "html").

  • ggeffects_html_engine: String, either "tt" or "gt". Defines the default engine to use for printing HTML tables. If "tt", the tinytable package is used, if "gt", the gt package is used, e.g. options(ggeffects_html_engine = "gt").

Use options(<option_name> = NULL) to remove the option.

References

Esarey, J., & Sumner, J. L. (2017). Marginal effects in interaction models: Determining and controlling the false positive rate. Comparative Political Studies, 1–33. Advance online publication. doi: 10.1177/0010414017730080

See also

There is also an equivalence_test() method in the parameters package (parameters::equivalence_test.lm()), which can be used to test contrasts or comparisons for practical equivalence. This method also has a plot() method, hence it is possible to do something like:

library(parameters)
predict_response(model, focal_terms) |>
  equivalence_test() |>
  plot()

Examples

if (FALSE) { # requireNamespace("marginaleffects") && requireNamespace("parameters") && interactive()
# \donttest{
data(efc)
efc$c172code <- as.factor(efc$c172code)
efc$c161sex <- as.factor(efc$c161sex)
levels(efc$c161sex) <- c("male", "female")
m <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)

# direct computation of comparisons
test_predictions(m, "c172code")

# passing a `ggeffects` object
pred <- predict_response(m, "c172code")
test_predictions(pred)

# test for slope
test_predictions(m, "c12hour")

# interaction - contrasts by groups
m <- lm(barthtot ~ c12hour + c161sex * c172code + neg_c_7, data = efc)
test_predictions(m, c("c161sex", "c172code"), test = NULL)

# interaction - pairwise comparisons by groups
test_predictions(m, c("c161sex", "c172code"))

# equivalence testing
test_predictions(m, c("c161sex", "c172code"), equivalence = c(-2.96, 2.96))

# equivalence testing, using the parameters package
pr <- predict_response(m, c("c161sex", "c172code"))
parameters::equivalence_test(pr)

# interaction - collapse unique levels
test_predictions(m, c("c161sex", "c172code"), collapse_levels = TRUE)

# p-value adjustment
test_predictions(m, c("c161sex", "c172code"), p_adjust = "tukey")

# not all comparisons, only by specific group levels
test_predictions(m, "c172code", by = "c161sex")

# specific comparisons
test_predictions(m, c("c161sex", "c172code"), test = "b2 = b1")

# interaction - slope by groups
m <- lm(barthtot ~ c12hour + neg_c_7 * c172code + c161sex, data = efc)
test_predictions(m, c("neg_c_7", "c172code"))

# Interaction and consecutive contrasts -----------------
# -------------------------------------------------------
data(coffee_data, package = "ggeffects")
m <- lm(alertness ~ time * coffee + sex, data = coffee_data)

# consecutive contrasts
test_predictions(m, "time", by = "coffee", test = "consecutive")

# interaction contrasts - difference-in-difference comparisons
pr <- predict_response(m, c("time", "coffee"), margin = "marginalmeans")
test_predictions(pr, test = "interaction")

# Custom contrasts --------------------------------------
# -------------------------------------------------------
wakeup_time <- data.frame(
  "wakeup vs later" = c(-2, 1, 1) / 2, # make sure each "side" sums to (+/-)1!
  "start vs end of day" = c(-1, 0, 1)
)
test_predictions(m, "time", by = "coffee", test = wakeup_time)

# Example: marginal effects -----------------------------
# -------------------------------------------------------
data(iris)
m <- lm(Petal.Width ~ Petal.Length + Species, data = iris)

# we now want the marginal effects for "Species". We can calculate
# the marginal effect using the "marginaleffects" package
marginaleffects::avg_slopes(m, variables = "Species")

# finally, test_predictions() returns the same. while the previous results
# report the marginal effect compared to the reference level "setosa",
# test_predictions() returns the marginal effects for all pairwise comparisons
test_predictions(m, "Species")
# }
}