regressionmodels

Overview of R Modelling Packages

This is an overview of R packages and functions for fitting different types of regression models. For each row, the upper cells in the last column (packages and functions) refer to “simple” models, while the lower cells refer to their mixed models counterpart (if available and known).

This overview raises no claims towards completeness of available modelling packages. Rather, it shows commonly or more often used packages, but there a plenty of other packages as well (that might even perform better in doing those mentioned tasks - if you’re aware of such packages or think that an important package or function is missing, please file an issue).

Modelling Packages

Nature of Response Example Type of Regression R package or function Example Webpage Bayesian with brms
Continuous Quality of Life, linear scales linear lm() brm(family = gaussian())
- lmer()
- glmmTMB()
Binary Success yes/no binary logistic glm(family=binomial) UCLA brm(family = binomial())
- glmer(*)
- glmmTMB(*)
Binary, weighted Success yes/no, with weights quasi-binary logistic glm(family=quasibinomial)
glmmPQL(family="quasibinomial")
Trials (or proportions of counts) 20 successes out of 30 trials logistic glm(cbind(successes, failures), family=binomial) Hadley’s notes brm(successes | trials(total), family = binomial())
- glmer(*)
- glmmTMB(*)
Count data Number of usage, counts of events Poisson glm(family=poisson) UCLA brm(family = poisson())
- glmer(*)
- glmmTMB(*)
Count data, with excess zeros or overdispersion Number of usage, counts of events (with higher variance than mean of response) negative binomial glm.nb() UCLA brm(family = negbinomial())
- glmer.nb()
- glmmTMB(family=nbinom)
Count data with very many zeros (inflation) see count data, but response is modelled as mixture of Bernoulli & Poisson distribution (two sources of zeros) zero-inflated zeroinfl() UCLA brm(family = zero_inflated_poisson())
glmmTMB(ziformula, family=poisson)
Count data, with very many zeros (inflation) and overdispersion Number of usage, counts of events (with higher variance than mean of response) zero-inflated negative binomial zeroinfl(dist="negbin") UCLA brm(family = zero_inflated_negbinomial())
glmmTMB(ziformula, family=nbinom)
Count data, zero-truncated see count data, but only for positive counts (hurdle component models zero-counts) hurdle (Poisson) hurdle() UCLA brm(family = hurdle_poisson())
glmmTMB(family=truncated_poisson)
Count data, zero-truncated and overdispersion see “Count data, zero-truncated”, but with higher variance than mean of response hurdle (neg. binomial) vglm(family=posnegbinomial) UCLA brm(family = hurdle_negbinomial())
glmmTMB(family=truncated_nbinom)
Proportion / Ratio (without zero and one) Percentages, proportion of continuous data Beta (see note below) betareg() ouR data generation brm(family = Beta())
glmmTMB(family=beta_family)
Proportion / Ratio (including zero and one) Percentages, proportions of continuous data Beta-Binomial, zero-inflated Beta, ordered Beta (see note below) - BBreg()
- betabin()
- vglm(family=betabinomial)
- ordbetareg()
ouR data generation brm(family = zero_one_inflated_beta())
- glmmTMB(ziformula, family=beta_family)
- glmmTMB(ziformula, family= betabinomial)
- glmmTMB(ziformula, family= ordbeta)
- ordbetareg()
Ordinal Likert scale, worse/ok/better ordinal, proportional odds, cumulative - polr()
- clm()
- bracl()
UCLA brm(family = cumulative())
- clmm()
- mixor()
- MCMCglmm(family = "ordinal")
Multinomial No natural order of categories, like red/green/blue multinomial - multinom()
- brmultinom()
UCLA brm(family = multinomial())
MCMCglmm(family = "multinomial")
Continuous, right-skewed Financial data, reaction times Gamma glm(family=Gamma) Sean Anderson brm(family = Gamma()), but see also Reaction time distributions in brms
- glmer(*)
- glmmTMB(*)
(Semi-)Continuous, (right) skewed, probably with spike at zero (zero-inlfated) Financial data, probably exponential dispersion of variance Tweedie - glm(family=tweedie)
- cpglm()
Revolutions
- cpglmm()
- glmmTMB(*)
(Semi-)Continuous, (right) skewed, probably with spike at zero (zero-inlfated) Normal distribution, but negative values are censored and stacked on zero Tobit - tobit()
- censReg()
brm(y | cens(), family = gaussian())
semLme()
Continuous, but truncated or outliers truncated - censReg()
- tobit()
- vglm(family=tobit)
UCLA-1, UCLA-2 brm(y | trunc(), family = gaussian())
Continuous, but exponential growth log-transformed, non-linear - glm(family=Gaussian("log")
- nls()
Some useful equations, linear vs. non-linear regression
- glmmTMB(*)
- nlmer()
- nlme()
Proportion / Ratio with more than 2 categories Biomass partitioning in plants (ratio of leaf, stem and root mass) Dirichlet DirichReg() brm(family = dirichlet())
Time-to-Event Survival-analysis, time until event/death occurs Cox (proportional hazards) coxph UCLA brm(family = cox())
coxme()

Included packages for non-mixed models:

Included packages for mixed models:

Included packages for Bayesian models (mixed an non-mixed):

Handout

There is a handout in PDF-format.