Variance measures how far a dataset's values spread from the mean. For a population of $N$ values $x_1, \ldots, x_N$ with mean $\mu$ :

$\sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2$

For a sample of $n$ values with sample mean $\bar{x}$ , divide by $n - 1$ instead of $n$ (Bessel's correction, an unbiased estimator).

A small variance means values cluster near the mean; a large variance means they are scattered. Variance is in squared units of the original data (kg² if data is in kg) — that's why we usually report standard deviation $\sigma = \sqrt{\sigma^2}$ , which has the same units as the data.

Variance underlies all of inferential statistics: confidence intervals, hypothesis tests, and regression all depend on estimating variance. The bias-variance tradeoff in machine learning is named for it.

Variance

Variance measures the spread of a dataset around its mean. It is the average of squared deviations. Standard deviation is the square root of variance.

相关资源