Hypothesis testing is the workhorse of statistical inference, used everywhere from clinical trials to A/B tests on websites. Yet it is also the most misunderstood topic in statistics. This guide walks through the full pipeline once — clearly — so you understand what a p-value really means.
The five steps
- State and : the null hypothesis (status quo) and alternative (the claim you want to support).
- Pick a significance level : usually 0.05 or 0.01.
- Compute the test statistic from your data (, , , etc.).
- Find the p-value: the probability of seeing data this extreme if were true.
- Decide: if , reject ; otherwise fail to reject.
Note: "fail to reject" ≠ "accept ". You merely don't have enough evidence against it.
One-sample z-test (worked example)
A factory claims its bulbs last 1000 hours on average (). You test 25 bulbs and measure . Is the claim refuted at ?
- , .
- , two-tailed.
- Test statistic: .
- p-value: .
- Since , reject . The mean lifetime is significantly different from 1000 hours.
Picking the right test
| Situation | Test |
|---|---|
| One mean, known | one-sample z-test |
| One mean, unknown, n small | one-sample t-test |
| Two means, independent samples | two-sample t-test |
| Two paired means | paired t-test |
| Proportion(s) | z-test for proportion |
| Goodness of fit / contingency | chi-square |
Type I vs Type II error
- Type I: rejecting a true . Probability = .
- Type II: failing to reject a false . Probability = .
- Power = : probability of correctly detecting a real effect.
These three move together: shrinking raises for fixed sample size; raising sample size lowers both.
Common mistakes
- "p-value = probability is true" — false. p-value is , not .
- Multiple comparisons — running 20 tests at guarantees ≈1 false positive on average. Use a correction.
- Conflating significance with importance — a tiny effect with huge can be highly significant yet practically irrelevant.
Try with the AI Hypothesis Test Solver
Use the Hypothesis Test Solver to plug in your data and get the test statistic, p-value, and decision.
Related references:
- Z-Score Calculator — the building block of every z-test
- Standard Deviation Calculator — the variability input
- Normal Distribution Calculator — what z-tests assume