Gradient descent

The downhill-walking algorithm that trains models — nudge every parameter to shrink the error, then repeat.

Gradient descent is the basic algorithm that lets a model learn — the optimization engine underneath almost all modern machine learning . Training starts with a model that’s wrong, scores how wrong with a loss (a single number measuring error), and then asks, for every one of the model’s parameters : “which way should I nudge this, and how hard, to make the loss a little smaller?” That whole bundle of directions-and-magnitudes is the gradient — the slope of the loss. Take one small step downhill along it and the model gets slightly less wrong; repeat millions of times and the error grinds toward a bottom. The size of each step is the learning rate: too large and the model overshoots and thrashes, too small and training crawls. In practice you rarely use the whole dataset for each step — stochastic gradient descent (SGD) estimates the slope from one small batch at a time, and smarter variants like Adam adapt the step size per parameter as they go. One thing worth keeping straight: gradient descent only ever drives the training loss down — whether the model is genuinely learning rather than memorizing is what validation loss is there to catch. And the machinery doesn’t have to move a model’s weights at all: freeze the model and run the same descent over its input instead, and you can search a latent space for the point that produces what you want — which is exactly how GAN-based upscalers hunt for a face that matches a blurry photo.