The gradient of $f(x_1, \ldots, x_n)$ is the vector of all partial derivatives: $\nabla f = (\partial f/\partial x_1, \ldots, \partial f/\partial x_n)$ .

Geometric interpretation: at any point, $\nabla f$ points in the direction of steepest ascent, with magnitude equal to the rate of change in that direction.

To find local maxima/minima, set $\nabla f = \vec{0}$ and check second-order conditions. To minimise (e.g. ML loss), walk in $-\nabla f$ direction — this is gradient descent, the backbone of modern machine learning. Variants (momentum, Adam, RMSprop) all build on this idea.

The gradient is perpendicular to level curves of the function. The directional derivative in direction $\vec{u}$ (unit vector) is $\nabla f \cdot \vec{u}$ .

Gradient

The gradient of a multivariable function f(x,y,...) is the vector of partial derivatives. It points in the direction of steepest ascent and is the foundation of gradient descent.