statistics

Correlation

Correlation measures the strength and direction of the linear relationship between two variables. The Pearson coefficient r is in [-1, 1]: 1 = perfect positive, -1 = perfect negative, 0 = no linear relationship.

本术语中文版本即将上线。下方暂以英文原文展示。

Correlation measures the strength and direction of the linear relationship between two variables XX and YY. The Pearson correlation coefficient:

r=(xixˉ)(yiyˉ)(xixˉ)2(yiyˉ)2[1,1]r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \in [-1, 1]

Interpretation:

  • r=1r = 1: perfect positive linear relationship.
  • r=1r = -1: perfect negative linear relationship.
  • r=0r = 0: no linear relationship (but possibly a non-linear one!).
  • r>0.7|r| > 0.7: strong; 0.3<r<0.70.3 < |r| < 0.7: moderate; r<0.3|r| < 0.3: weak.

Crucial caveats:

  • Correlation is not causation. Ice cream sales correlate with drowning deaths — both driven by hot weather.
  • Sensitive to outliers. A single extreme point can flip rr.
  • Linear only. A perfect quadratic relationship y=x2y = x^2 has r0r \approx 0 around symmetric data.

For ranked / non-linear monotonic relationships, use Spearman's ρ\rho. For categorical association, use chi-square or Cramér's V.