Linear regression finds the straight line $y = mx + b$ that best fits a set of $(x, y)$ data points. "Best" is defined by the least squares criterion: minimising the sum of squared vertical distances between the line and the points.

The slope and intercept have closed-form solutions:

$m = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}, \qquad b = \bar{y} - m\bar{x}$

The coefficient of determination $R^2$ measures fit quality (between 0 and 1; closer to 1 = better fit).

Linear regression is the simplest predictive model and the foundation of more sophisticated methods:

Multiple regression uses several inputs.
Logistic regression adapts the idea for binary outcomes.
Ridge / Lasso add regularisation.
Modern machine learning's "linear models" are direct descendants.

Despite its simplicity, linear regression remains heavily used in finance (CAPM), epidemiology, economics, and as a baseline against which fancier models must justify their complexity.

Linear Regression

Linear regression fits a straight line to data: y = mx + b. The line minimises the sum of squared vertical distances to the points (least squares).