statistics

Chi-square (χ²) Test

The chi-square test compares observed counts to expected counts in categorical data. χ² = Σ(O−E)²/E. Used for goodness-of-fit and independence tests.

The chi-square (χ2\chi^2) test is the standard tool for categorical data. The test statistic:

χ2=i(OiEi)2Ei\chi^2 = \sum_i \frac{(O_i - E_i)^2}{E_i}

where OiO_i are observed counts and EiE_i are expected under H0H_0.

Three common variants:

  • Goodness-of-fit: does observed distribution match a theoretical one? (Is a die fair?). df=k1df = k - 1.
  • Independence: are two categorical variables independent? (Is gender independent of voting preference?). df=(r1)(c1)df = (r-1)(c-1) for r×cr \times c contingency tables.
  • Variance test: less common.

Assumption: expected counts must be sufficiently large (typically 5\geq 5 in each cell). For small samples, use Fisher's exact test instead.

The chi-square distribution itself is the distribution of a sum of squared standard normals — used to construct critical values.