Standard deviation is the most misunderstood concept in introductory statistics. People know it "measures spread" but freeze when asked what the number actually means. This guide explains it three ways — geometric, computational, and intuitive — so the next time you see in a paper or report, you actually understand what's there.
Plain-English definition
Standard deviation answers: on average, how far does each data point sit from the mean?
Symbolically, for a population of values with mean :
Read aloud: "average squared deviation, then square root."
Why squared, then square-rooted?
A reasonable first attempt at "average distance from mean" might be — the mean absolute deviation. It works, and statisticians do use it sometimes (it's more robust to outliers).
But absolute value is mathematically awkward — it's not differentiable at zero, derivatives explode, and you can't do calculus with it cleanly. Squaring sidesteps all that, and the square root at the end brings the units back to the original scale (so is in dollars if is in dollars, not dollars²).
This is the same reason machine learning uses squared loss (mean squared error) — squaring is differentiable, plays nicely with calculus, and the resulting estimators are often optimal.
Population vs sample — the vs thing
Two formulas exist, and the difference matters:
- Population (you have all the data): divide by . Symbol .
- Sample (you have a sample, want to estimate population): divide by . Symbol .
The sample formula's is Bessel's correction. Why? Using would systematically underestimate the population standard deviation because you used the sample mean (which is by construction the best fit for the sample), squeezing the deviations smaller than they would be against the true population mean. Dividing by instead of exactly compensates.
Most calculators and software default to the sample formula. Pay attention.
Worked example 1: small symmetric dataset
Data: . (8 values; classic textbook example.)
- Mean: .
- Deviations from mean: .
- Squared deviations: .
- Sum: .
- Population (): variance , .
- Sample (): variance , .
The 68-95-99.7 rule (only for normal distributions)
If your data is approximately normal (bell-shaped):
- of values fall within of the mean.
- within .
- within .
This is why "" or "two sigma" is the default casual definition of "statistically unusual."
⚠️ Warning: this rule applies only to normal distributions. For skewed or heavy-tailed data (income, response time), might cover 80% of data — or 50%. Always check the distribution shape (histogram, QQ plot) before quoting the 68-95-99.7 numbers.
Standard deviation vs variance
Variance is just . They contain identical information, so why have both?
- Standard deviation has the same units as the data — interpretable.
- Variance decomposes additively for independent variables ( when independent), making it the algebraically convenient quantity for proofs, expectations, and ANOVA.
Use when reporting; use when doing calculations.
Common mistakes
- Quoting without context. "" means nothing if you don't know the mean. Always pair: "mean , ."
- Mixing population and sample formulas. With small samples it makes a real difference. With large samples () the difference is negligible.
- Forgetting outlier sensitivity. One extreme value can balloon . For heavy-tailed data, also report the median absolute deviation (MAD) for robustness.
- Applying 68-95-99.7 to non-normal data. See above.
Try it yourself
Drop any dataset into our free Standard Deviation Calculator — choose population or sample, see step-by-step computation, and verify against this guide.
Related material: