statistics

Linear Regression

Linear regression fits a straight line to data: y = mx + b. The line minimises the sum of squared vertical distances to the points (least squares).

本术语中文版本即将上线。下方暂以英文原文展示。

Linear regression finds the straight line y=mx+by = mx + b that best fits a set of (x,y)(x, y) data points. "Best" is defined by the least squares criterion: minimising the sum of squared vertical distances between the line and the points.

The slope and intercept have closed-form solutions:

m=nxyxynx2(x)2,b=yˉmxˉm = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}, \qquad b = \bar{y} - m\bar{x}

The coefficient of determination R2R^2 measures fit quality (between 0 and 1; closer to 1 = better fit).

Linear regression is the simplest predictive model and the foundation of more sophisticated methods:

  • Multiple regression uses several inputs.
  • Logistic regression adapts the idea for binary outcomes.
  • Ridge / Lasso add regularisation.
  • Modern machine learning's "linear models" are direct descendants.

Despite its simplicity, linear regression remains heavily used in finance (CAPM), epidemiology, economics, and as a baseline against which fancier models must justify their complexity.