Case Study: Simple Workflow Using Logistic Regression
Source:vignettes/practical_glm_workflow.Rmd
practical_glm_workflow.Rmd
This vignette demonstrates a typical workflow using the
ggeffects
package, with a logistic regression model as an
example. We will explore various aspects of the model, such as model
coefficients, predicted probabilities, and pairwise comparisons. Let’s
get started!
Preparing the data and fitting a model
First, we load the ggeffects
package and the
coffee_data
data set, which is included in the package. The
data set contains information on the effect of coffee consumption on
alertness over time. The outcome variable is binary (alertness), and the
predictor variables are coffee consumption (treatment) and time.
library(ggeffects)
library(parameters) # for model summary
library(datawizard) # for recodings
data(coffee_data, package = "ggeffects")
# dichotomize outcome variable
coffee_data$alertness <- categorize(coffee_data$alertness, lowest = 0)
# rename variable
coffee_data$treatment <- coffee_data$coffee
# model
model <- glm(alertness ~ treatment * time, data = coffee_data, family = binomial())
Exploring the model - model coefficients
Let’s start by examining the model coefficients. We can use the
model_parameters()
function to extract the coefficients
from the model. By setting exponentiate = TRUE
, we can
obtain the odds ratios for the coefficients.
# coefficients
model_parameters(model, exponentiate = TRUE)
#> Parameter | Odds Ratio | SE | 95% CI | z | p
#> -----------------------------------------------------------------------------------------------
#> (Intercept) | 1.00 | 0.45 | [0.41, 2.44] | -1.54e-15 | > .999
#> treatment [control] | 0.33 | 0.23 | [0.08, 1.23] | -1.61 | 0.108
#> time [noon] | 0.54 | 0.35 | [0.15, 1.90] | -0.96 | 0.339
#> time [afternoon] | 3.00 | 2.05 | [0.81, 12.24] | 1.61 | 0.108
#> treatment [control] × time [noon] | 10.35 | 9.85 | [1.66, 70.73] | 2.45 | 0.014
#> treatment [control] × time [afternoon] | 1.00 | 0.97 | [0.15, 6.74] | -6.10e-16 | > .999
#>
#> Uncertainty intervals (profile-likelihood) and p-values (two-tailed) computed using a Wald z-distribution approximation.
The model coefficients are difficult to interpret directly, in
particular sinc we have an interaction effect. Instead, we should use
the predict_response()
function to calculate predicted
probabilities for the model. These refer to the adjusted probabilities
of the outcome (higher alertness) depending on the predictor variables
(treatment and time).
Predicted probabilities - understanding the model
Thus, since we are interested in the interaction effect of coffee
consumption (treatment) on alertness depending on different times of the
day, we simply specify these two variables as focal terms in
the predict_response()
function.
# predicted probabilities
predictions <- predict_response(model, c("time", "treatment"))
plot(predictions)
As we can see, the predicted probabilities of alertness are higher for participants who consumed coffee compared to those who did not, but only in the morning and in the afternoon. Furthermore, we see differences between the coffee and the control group at each time point - but are these differences statistically significant?
Pairwise comparisons - testing the differences
To check this, we finally use the test_predictions()
function to perform pairwise comparisons of the predicted probabilities.
We simply pass our results from predict_response()
to the
function.
# pairwise comparisons - quite long table
test_predictions(predictions)
#> # Pairwise comparisons
#>
#> time | treatment | Contrast | 95% CI | p
#> ------------------------------------------------------------------------
#> morning-noon | coffee-coffee | 0.15 | -0.15, 0.45 | 0.332
#> morning-afternoon | coffee-coffee | -0.25 | -0.54, 0.04 | 0.091
#> morning-morning | coffee-control | 0.25 | -0.04, 0.54 | 0.091
#> morning-noon | coffee-control | -0.15 | -0.45, 0.15 | 0.332
#> morning-afternoon | coffee-control | 0.00 | -0.31, 0.31 | > .999
#> noon-afternoon | coffee-coffee | -0.40 | -0.68, -0.12 | 0.005
#> noon-morning | coffee-control | 0.10 | -0.18, 0.38 | 0.488
#> noon-noon | coffee-control | -0.30 | -0.60, 0.00 | 0.047
#> noon-afternoon | coffee-control | -0.15 | -0.45, 0.15 | 0.332
#> afternoon-morning | coffee-control | 0.50 | 0.23, 0.77 | < .001
#> afternoon-noon | coffee-control | 0.10 | -0.18, 0.38 | 0.488
#> afternoon-afternoon | coffee-control | 0.25 | -0.04, 0.54 | 0.091
#> morning-noon | control-control | -0.40 | -0.68, -0.12 | 0.005
#> morning-afternoon | control-control | -0.25 | -0.54, 0.04 | 0.091
#> noon-afternoon | control-control | 0.15 | -0.15, 0.45 | 0.332
#>
#> Contrasts are presented as probabilities (in %-points).
In the above output, we see all possible pairwise comparisons of the predicted probabilities. The table is quite long, but we can also group the comparisons, e.g. by the variable time.
# group comparisons by "time"
test_predictions(predictions, by = "time")
#> # Pairwise comparisons
#>
#> time | treatment | Contrast | 95% CI | p
#> ------------------------------------------------------------
#> morning | coffee-control | 0.25 | -0.04, 0.54 | 0.091
#> noon | coffee-control | -0.30 | -0.60, 0.00 | 0.047
#> afternoon | coffee-control | 0.25 | -0.04, 0.54 | 0.091
#>
#> Contrasts are presented as probabilities (in %-points).
The output shows that the differences between the coffee and the control group are statistically significant only in the noon time.