Standard deviation is the most misunderstood concept in introductory statistics. People know it "measures spread" but freeze when asked what the number actually means. This guide explains it three ways — geometric, computational, and intuitive — so the next time you see $\sigma$ in a paper or report, you actually understand what's there.

Plain-English definition

Standard deviation answers: on average, how far does each data point sit from the mean?

Symbolically, for a population of $N$ values $x_1, \ldots, x_N$ with mean $\mu$ :

$\sigma = \sqrt{\frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2}$

Read aloud: "average squared deviation, then square root."

Why squared, then square-rooted?

A reasonable first attempt at "average distance from mean" might be $\frac{1}{N}\sum |x_i - \mu|$ — the mean absolute deviation. It works, and statisticians do use it sometimes (it's more robust to outliers).

But absolute value is mathematically awkward — it's not differentiable at zero, derivatives explode, and you can't do calculus with it cleanly. Squaring sidesteps all that, and the square root at the end brings the units back to the original scale (so $\sigma$ is in dollars if $x$ is in dollars, not dollars²).

This is the same reason machine learning uses squared loss (mean squared error) — squaring is differentiable, plays nicely with calculus, and the resulting estimators are often optimal.

Population vs sample — the $n-1$ vs $n$ thing

Two formulas exist, and the difference matters:

Population (you have all the data): divide by $N$ . Symbol $\sigma$ .
Sample (you have a sample, want to estimate population): divide by $n - 1$ . Symbol $s$ .

The sample formula's $n - 1$ is Bessel's correction. Why? Using $n$ would systematically underestimate the population standard deviation because you used the sample mean (which is by construction the best fit for the sample), squeezing the deviations smaller than they would be against the true population mean. Dividing by $n - 1$ instead of $n$ exactly compensates.

Most calculators and software default to the sample formula. Pay attention.

Worked example 1: small symmetric dataset

Data: $\{2, 4, 4, 4, 5, 5, 7, 9\}$ . (8 values; classic textbook example.)

Mean: $\bar{x} = \frac{2 + 4 + 4 + 4 + 5 + 5 + 7 + 9}{8} = 5$ .
Deviations from mean: $-3, -1, -1, -1, 0, 0, 2, 4$ .
Squared deviations: $9, 1, 1, 1, 0, 0, 4, 16$ .
Sum: $32$ .
Population ( $N = 8$ ): variance $= 32/8 = 4$ , $\sigma = 2$ .
Sample ( $n - 1 = 7$ ): variance $= 32/7 \approx 4.57$ , $s \approx 2.14$ .

The 68-95-99.7 rule (only for normal distributions)

If your data is approximately normal (bell-shaped):

$\approx 68\%$ of values fall within $1\sigma$ of the mean.
$\approx 95\%$ within $2\sigma$ .
$\approx 99.7\%$ within $3\sigma$ .

This is why " $\pm 2\sigma$ " or "two sigma" is the default casual definition of "statistically unusual."

⚠️ Warning: this rule applies only to normal distributions. For skewed or heavy-tailed data (income, response time), $1\sigma$ might cover 80% of data — or 50%. Always check the distribution shape (histogram, QQ plot) before quoting the 68-95-99.7 numbers.

Standard deviation vs variance

Variance is just $\sigma^2$ . They contain identical information, so why have both?

Standard deviation has the same units as the data — interpretable.
Variance decomposes additively for independent variables ( $\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)$ when independent), making it the algebraically convenient quantity for proofs, expectations, and ANOVA.

Use $\sigma$ when reporting; use $\sigma^2$ when doing calculations.

Common mistakes

Quoting $\sigma$ without context. " $\sigma = 5$ " means nothing if you don't know the mean. Always pair: "mean $= 100$ , $\sigma = 5$ ."
Mixing population and sample formulas. With small samples it makes a real difference. With large samples ( $n > 100$ ) the difference is negligible.
Forgetting outlier sensitivity. One extreme value can balloon $\sigma$ . For heavy-tailed data, also report the median absolute deviation (MAD) for robustness.
Applying 68-95-99.7 to non-normal data. See above.

Try it yourself

Drop any dataset into our free Standard Deviation Calculator — choose population or sample, see step-by-step computation, and verify against this guide.

Related material:

Understanding Standard Deviation Without Tears

Standard deviation in plain English: what it really measures, the difference between population and sample, the 68-95-99.7 rule, and three worked examples you can verify.

Plain-English definition

Why squared, then square-rooted?

Population vs sample — the $n-1$ vs $n$ thing

Worked example 1: small symmetric dataset

The 68-95-99.7 rule (only for normal distributions)

Standard deviation vs variance

Common mistakes

Try it yourself

Understanding Standard Deviation Without Tears

Standard deviation in plain English: what it really measures, the difference between population and sample, the 68-95-99.7 rule, and three worked examples you can verify.

Plain-English definition

Why squared, then square-rooted?

Population vs sample — the n−1n-1n−1 vs nnn thing

Worked example 1: small symmetric dataset

The 68-95-99.7 rule (only for normal distributions)

Standard deviation vs variance

Common mistakes

Try it yourself

Population vs sample — the $n-1$ vs $n$ thing