regressionmodels

Overview of R Modelling Packages

This is an overview of R packages and functions for fitting different types of regression models. For each row, the upper cells in the last column (packages and functions) refer to “simple” models, while the lower cells refer to their mixed models counterpart (if available and known).

This overview raises no claims towards completeness of available modelling packages. Rather, it shows commonly or more often used packages, but there a plenty of other packages as well (that might even perform better in doing those mentioned tasks - if you’re aware of such packages or think that an important package or function is missing, please file an issue).

Modelling Packages

Nature of Response	Example	Type of Regression	R package or function	Example Webpage	Bayesian with `brms`
Continuous	Quality of Life, linear scales	linear	`lm()`		`brm(family = gaussian())`
			- `lmer()` - `glmmTMB()`
Binary	Success yes/no	binary logistic	`glm(family=binomial)`	UCLA	`brm(family = binomial())`
			- `glmer()` - `glmmTMB()`
Binary, weighted	Success yes/no, with weights	quasi-binary logistic	`glm(family=quasibinomial)`
			`glmmPQL(family="quasibinomial")`
Trials (or proportions of counts)	20 successes out of 30 trials	logistic	`glm(cbind(successes, failures), family=binomial)`	Hadley’s notes	`brm(successes \| trials(total), family = binomial())`
			- `glmer()` - `glmmTMB()`
Count data	Number of usage, counts of events	Poisson	`glm(family=poisson)`	UCLA	`brm(family = poisson())`
			- `glmer()` - `glmmTMB()`
Count data, with excess zeros or overdispersion	Number of usage, counts of events (with higher variance than mean of response)	negative binomial	`glm.nb()`	UCLA	`brm(family = negbinomial())`
			- `glmer.nb()` - `glmmTMB(family=nbinom)`
Count data with very many zeros (inflation)	see count data, but response is modelled as mixture of Bernoulli & Poisson distribution (two sources of zeros)	zero-inflated	`zeroinfl()`	UCLA	`brm(family = zero_inflated_poisson())`
			`glmmTMB(ziformula, family=poisson)`
Count data, with very many zeros (inflation) and overdispersion	Number of usage, counts of events (with higher variance than mean of response)	zero-inflated negative binomial	`zeroinfl(dist="negbin")`	UCLA	`brm(family = zero_inflated_negbinomial())`
			`glmmTMB(ziformula, family=nbinom)`
Count data, zero-truncated	see count data, but only for positive counts (hurdle component models zero-counts)	hurdle (Poisson)	`hurdle()`	UCLA	`brm(family = hurdle_poisson())`
			`glmmTMB(family=truncated_poisson)`
Count data, zero-truncated and overdispersion	see “Count data, zero-truncated”, but with higher variance than mean of response	hurdle (neg. binomial)	`vglm(family=posnegbinomial)`	UCLA	`brm(family = hurdle_negbinomial())`
			`glmmTMB(family=truncated_nbinom)`
Proportion / Ratio (without zero and one)	Percentages, proportion of continuous data	Beta (see note below)	`betareg()`	ouR data generation	`brm(family = Beta())`
			`glmmTMB(family=beta_family)`
Proportion / Ratio (including zero and one)	Percentages, proportions of continuous data	Beta-Binomial, zero-inflated Beta, ordered Beta (see note below)	- `BBreg()` - `betabin()` - `vglm(family=betabinomial)` - `ordbetareg()`	ouR data generation	`brm(family = zero_one_inflated_beta())`
			- `glmmTMB(ziformula, family=beta_family)` - `glmmTMB(ziformula, family= betabinomial)` - `glmmTMB(ziformula, family= ordbeta)` - `ordbetareg()`
Ordinal	Likert scale, worse/ok/better	ordinal, proportional odds, cumulative	- `polr()` - `clm()` - `bracl()`	UCLA	`brm(family = cumulative())`
			- `clmm()` - `mixor()` - `MCMCglmm(family = "ordinal")`
Multinomial	No natural order of categories, like red/green/blue	multinomial	- `multinom()` - `brmultinom()`	UCLA	`brm(family = multinomial())`
			`MCMCglmm(family = "multinomial")`
Continuous, right-skewed	Financial data, reaction times	Gamma	`glm(family=Gamma)`	Sean Anderson	`brm(family = Gamma())`, but see also Reaction time distributions in `brms`
			- `glmer()` - `glmmTMB()`
(Semi-)Continuous, (right) skewed, probably with spike at zero (zero-inlfated)	Financial data, probably exponential dispersion of variance	Tweedie	- `glm(family=tweedie)` - `cpglm()`	Revolutions
			- `cpglmm()` - `glmmTMB(*)`
(Semi-)Continuous, (right) skewed, probably with spike at zero (zero-inlfated)	Normal distribution, but negative values are censored and stacked on zero	Tobit	- `tobit()` - `censReg()`		`brm(y \| cens(), family = gaussian())`
			`semLme()`
Continuous, but truncated or outliers		truncated	- `censReg()` - `tobit()` - `vglm(family=tobit)`	UCLA-1, UCLA-2	`brm(y \| trunc(), family = gaussian())`
Continuous, but exponential growth		log-transformed, non-linear	- `glm(family=Gaussian("log")` - `nls()`	Some useful equations, linear vs. non-linear regression
			- `glmmTMB(*)` - `nlmer()` - `nlme()`
Proportion / Ratio with more than 2 categories	Biomass partitioning in plants (ratio of leaf, stem and root mass)	Dirichlet	`DirichReg()`		`brm(family = dirichlet())`
Time-to-Event	Survival-analysis, time until event/death occurs	Cox (proportional hazards)	`coxph`	UCLA	`brm(family = cox())`
			`coxme()`

* indicates that for the mixed models functions the same response-type and family should be used as for their glm counterpart.
Note that ratios or proportions from count data, like cbind(successes, failures), are modelled as logistic regression with glm(cbind(successes, failures), family=binomial()), while ratios from continuous data (where the response ranges from zero to one) are modelled using beta-regression.
Usually, zero-inflated models are used when 0 or 1 come from a separate process or category. However, when the 0/1 values are most consistent with censoring rather than with a separate category/process, the ordered beta regression is probably a better choice (i.e., 0 mean “below detection”, not “something qualitatively different happened”) (Source: https://twitter.com/bolkerb/status/1577755600808775680)

Included packages for non-mixed models:

Base R: lm(), glm()
AER: tobit()
aod: betabin()
betareg: betareg()
brglm2: bracl(), brmultinom()
censReg: censReg()
cplm: cpglm()
coxph: coxph()
DirichletReg: DirichReg()
HRQoL: BBreg()
MASS: glm.nb(), polr()
nnet: multinom()
ordbetareg: ordbetareg()
ordinal: clm(), clm2()
pscl: zeroinfl(), hurdle()
statmod: tweedie()
VGAM: vglm()

Included packages for mixed models:

cplm: cpglmm()
coxme: coxme()
glmmTMB: glmmTMB()
lme4: lmer(), glmer(), glmer.nb()
MASS: glmmPQL()
MCMCglmm: MCMCglmm()
mixor: mixor()
ordbetareg: ordbetareg()
ordinal: clmm(), clmm2()
smicd: semLme()

Included packages for Bayesian models (mixed an non-mixed):

brms: brm()

Handout

There is a handout in PDF-format.

This site is open source. Improve this page.