Descriptive statistics

Mean (population)

$\mu = \frac{1}{N}\sum_{i=1}^N x_i$

Average of all population values.

Mean (sample)

$\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$

Sample average.

Variance (population)

$\sigma^2 = \frac{1}{N}\sum (x_i - \mu)^2$

Spread squared, divides by N.

Variance (sample)

$s^2 = \frac{1}{n-1}\sum (x_i - \bar{x})^2$

Bessel's correction: divide by $n-1$ .

Standard deviation

$\sigma = \sqrt{\sigma^2}$

Square root of variance — same units as data.

Range

$R = x_{\max} - x_{\min}$

Simplest spread measure.

Probability rules

Addition rule

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

Probability of A or B (inclusion-exclusion).

Multiplication rule

$P(A \cap B) = P(A) \cdot P(B \mid A)$

Probability of A and B; reduces to product when independent.

Conditional probability

$P(B \mid A) = \frac{P(A \cap B)}{P(A)}$

Probability of B given A occurred.

Bayes' theorem

$P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)}$

Reverse conditional probabilities — diagnostic tests, machine learning.

Independence

$P(A \cap B) = P(A) P(B)$

Holds iff $A$ and $B$ are independent.

Counting

Permutations

$P(n,r) = \frac{n!}{(n-r)!}$

Order matters: arrange $r$ from $n$ .

Combinations

$C(n,r) = \binom{n}{r} = \frac{n!}{r!(n-r)!}$

Order doesn't matter: choose $r$ from $n$ .

Discrete distributions

Binomial PMF

$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$

$k$ successes in $n$ independent trials with success prob $p$ .

Binomial mean

$\mu = np$

Expected number of successes.

Binomial variance

$\sigma^2 = np(1-p)$

Spread of the binomial.

Poisson PMF

$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$

Rare-event count with mean rate $\lambda$ .

Normal distribution

PDF

$f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\bigl(-\tfrac{(x-\mu)^2}{2\sigma^2}\bigr)$

Bell curve, mean $\mu$ , std $\sigma$ .

Z-score

$z = \frac{x - \mu}{\sigma}$

Standardise to compare across distributions.

Standard normal

$Z \sim N(0, 1)$

After z-score transformation.

68-95-99.7 rule

$P(|X - \mu| < k\sigma):\ 0.68,\ 0.95,\ 0.997$

For $k = 1, 2, 3$ — only valid for normal data.

Inferential statistics

Standard error of mean

$SE = \frac{s}{\sqrt{n}}$

Standard deviation of $\bar{x}$ as estimator.

Confidence interval (mean, known $\sigma$)

$\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$

$z_{\alpha/2} = 1.96$ for 95% CI.

t-statistic (one sample)

$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$

Test mean = $\mu_0$ when $\sigma$ unknown.

Chi-square statistic

$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$

Goodness-of-fit / independence test for categorical data.

Linear regression

Slope

$b_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}$

Best-fit slope (least squares).

Intercept

$b_0 = \bar{y} - b_1 \bar{x}$

Forces line through $(\bar{x}, \bar{y})$ .

Pearson correlation

$r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}$

Strength + direction of linear relation, $r \in [-1, 1]$ .

Coefficient of determination

$R^2 = r^2$

Fraction of variance in $y$ explained by $x$ .

Statistics Formulas