Hypothesis testing is the workhorse of statistical inference, used everywhere from clinical trials to A/B tests on websites. Yet it is also the most misunderstood topic in statistics. This guide walks through the full pipeline once — clearly — so you understand what a p-value really means.

The five steps

State $H_0$ and $H_1$ : the null hypothesis (status quo) and alternative (the claim you want to support).
Pick a significance level $\alpha$ : usually 0.05 or 0.01.
Compute the test statistic from your data ( $z$ , $t$ , $\chi^2$ , etc.).
Find the p-value: the probability of seeing data this extreme if $H_0$ were true.
Decide: if $p < \alpha$ , reject $H_0$ ; otherwise fail to reject.

Note: "fail to reject" ≠ "accept $H_0$ ". You merely don't have enough evidence against it.

One-sample z-test (worked example)

A factory claims its bulbs last 1000 hours on average ( $\sigma = 50$ ). You test 25 bulbs and measure $\bar x = 980$ . Is the claim refuted at $\alpha = 0.05$ ?

$H_0: \mu = 1000$ , $H_1: \mu \ne 1000$ .
$\alpha = 0.05$ , two-tailed.
Test statistic: $z = \frac{\bar x - \mu_0}{\sigma / \sqrt{n}} = \frac{980 - 1000}{50/\sqrt{25}} = \frac{-20}{10} = -2$ .
p-value: $2 \cdot P(Z < -2) \approx 2 \cdot 0.0228 = 0.0456$ .
Since $0.0456 < 0.05$ , reject $H_0$ . The mean lifetime is significantly different from 1000 hours.

Picking the right test

Situation	Test
One mean, $\sigma$ known	one-sample z-test
One mean, $\sigma$ unknown, n small	one-sample t-test
Two means, independent samples	two-sample t-test
Two paired means	paired t-test
Proportion(s)	z-test for proportion
Goodness of fit / contingency	chi-square

Type I vs Type II error

Type I: rejecting a true $H_0$ . Probability = $\alpha$ .
Type II: failing to reject a false $H_0$ . Probability = $\beta$ .
Power = $1 - \beta$ : probability of correctly detecting a real effect.

These three move together: shrinking $\alpha$ raises $\beta$ for fixed sample size; raising sample size lowers both.

Common mistakes

"p-value = probability $H_0$ is true" — false. p-value is $P(\text{data} \mid H_0)$ , not $P(H_0 \mid \text{data})$ .
Multiple comparisons — running 20 tests at $\alpha = 0.05$ guarantees ≈1 false positive on average. Use a correction.
Conflating significance with importance — a tiny effect with huge $n$ can be highly significant yet practically irrelevant.

Try with the AI Hypothesis Test Solver

Use the Hypothesis Test Solver to plug in your data and get the test statistic, p-value, and decision.

Related references:

Z-Score Calculator — the building block of every z-test
Standard Deviation Calculator — the variability input
Normal Distribution Calculator — what z-tests assume

Hypothesis Testing Step-by-Step: From H0 to p-value

A practical guide to hypothesis testing — defining H0 and H1, picking the right test, computing the test statistic, and interpreting the p-value without misuse.