Measures of association for contingency tables

This function calculates various measure of association for contingency tables and returns the statistic and p-value. Supported measures are Cramer's V, Phi, Spearman's rho, Kendall's tau and Pearson's r.

cramers_v(tab, ...)

cramer(tab, ...)

# S3 method for class 'formula'
cramers_v(
  formula,
  data,
  ci.lvl = NULL,
  n = 1000,
  method = c("dist", "quantile"),
  ...
)

phi(tab, ...)

crosstable_statistics(
  data,
  x1 = NULL,
  x2 = NULL,
  statistics = c("auto", "cramer", "phi", "spearman", "kendall", "pearson", "fisher"),
  weights = NULL,
  ...
)

xtab_statistics(
  data,
  x1 = NULL,
  x2 = NULL,
  statistics = c("auto", "cramer", "phi", "spearman", "kendall", "pearson", "fisher"),
  weights = NULL,
  ...
)

Arguments

tab: A table() or ftable(). Tables of class xtabs() and other will be coerced to ftable objects.
...: Other arguments, passed down to the statistic functions chisq.test(), fisher.test() or cor.test().
formula: A formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor giving the corresponding groups.
data: A data frame or a table object. If a table object, x1 and x2 will be ignored. For Kendall's tau, Spearman's rho or Pearson's product moment correlation coefficient, data needs to be a data frame. If x1 and x2 are not specified, the first two columns of the data frames are used as variables to compute the crosstab.
ci.lvl: Scalar between 0 and 1. If not NULL, returns a data frame including lower and upper confidence intervals.
n: Number of bootstraps to be generated.
method: Character vector, indicating if confidence intervals should be based on bootstrap standard error, multiplied by the value of the quantile function of the t-distribution (default), or on sample quantiles of the bootstrapped values. See 'Details' in boot_ci(). May be abbreviated.
x1: Name of first variable that should be used to compute the contingency table. If data is a table object, this argument will be irgnored.
x2: Name of second variable that should be used to compute the contingency table. If data is a table object, this argument will be irgnored.
statistics: Name of measure of association that should be computed. May be one of "auto", "cramer", "phi", "spearman", "kendall", "pearson" or "fisher". See 'Details'.
weights: Name of variable in x that indicated the vector of weights that will be applied to weight all observations. Default is NULL, so no weights are used.

Value

For phi(), the table's Phi value. For [cramers_v()], the table's Cramer's V.

For crosstable_statistics(), a list with following components:

estimate: the value of the estimated measure of association.
p.value: the p-value for the test.
statistic: the value of the test statistic.
stat.name: the name of the test statistic.
stat.html: if applicable, the name of the test statistic, in HTML-format.
df: the degrees of freedom for the contingency table.
method: character string indicating the name of the measure of association.
method.html: if applicable, the name of the measure of association, in HTML-format.
method.short: the short form of association measure, equals the statistics-argument.
fisher: logical, if Fisher's exact test was used to calculate the p-value.

Details

The p-value for Cramer's V and the Phi coefficient are based on chisq.test(). If any expected value of a table cell is smaller than 5, or smaller than 10 and the df is 1, then fisher.test() is used to compute the p-value, unless statistics = "fisher"; in this case, the use of fisher.test() is forced to compute the p-value. The test statistic is calculated with cramers_v() resp. phi().

Both test statistic and p-value for Spearman's rho, Kendall's tau and Pearson's r are calculated with cor.test().

When statistics = "auto", only Cramer's V or Phi are calculated, based on the dimension of the table (i.e. if the table has more than two rows or columns, Cramer's V is calculated, else Phi).

References

Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982

Examples

# Phi coefficient for 2x2 tables
tab <- table(sample(1:2, 30, TRUE), sample(1:2, 30, TRUE))
phi(tab)
#> [1] 0.08183171

# Cramer's V for nominal variables with more than 2 categories
tab <- table(sample(1:2, 30, TRUE), sample(1:3, 30, TRUE))
cramer(tab)
#> [1] 0.1977963

# formula notation
data(efc)
cramer(e16sex ~ c161sex, data = efc)
#> [1] 0.05258249

# bootstrapped confidence intervals
cramer(e16sex ~ c161sex, data = efc, ci.lvl = .95, n = 100)
#>       cramer     conf.low conf.high
#> 1 0.05258249 -0.003469112 0.1140037

# 2x2 table, compute Phi automatically
crosstable_statistics(efc, e16sex, c161sex)
#> 
#> # Measure of Association for Contingency Tables
#> 
#>    Chi-squared: 2.2327
#>            Phi: 0.0526
#>             df: 1
#>        p-value: 0.135
#>   Observations: 900

# more dimensions than 2x2, compute Cramer's V automatically
crosstable_statistics(efc, c172code, c161sex)
#> 
#> # Measure of Association for Contingency Tables
#> 
#>    Chi-squared: 4.1085
#>     Cramer's V: 0.0699
#>             df: 2
#>        p-value: 0.128
#>   Observations: 841

# ordinal data, use Kendall's tau
crosstable_statistics(efc, e42dep, quol_5, statistics = "kendall")
#> 
#> # Measure of Association for Contingency Tables
#> 
#>               z: -9.5951
#>   Kendall's tau: -0.2496
#>              df: 75
#>         p-value: < .001***
#>    Observations: 896

# calcilate Spearman's rho, with continuity correction
crosstable_statistics(efc,
  e42dep,
  quol_5,
  statistics = "spearman",
  exact = FALSE,
  continuity = TRUE
)
#> 
#> # Measure of Association for Contingency Tables
#> 
#>                S: 157974157.4198
#>   Spearman's rho: -0.3177
#>               df: 75
#>          p-value: < .001***
#>     Observations: 896