Deep Learning
Deep learning is how computers learn patterns too subtle to program by hand — turning pixels into objects, audio into words, and prompts into prose.
At the bottom it is almost embarrassingly simple: multiply inputs by weights, add a bias, pass the result through a nonlinearity, and repeat. Stack enough of those and the whole thing can approximate essentially any function.
Depth is the trick: stack simple layers and the early ones learn edges and textures, the later ones objects and meaning — features composed from features, no human in the loop.
Learning is just calculus run backwards. Backpropagation pushes the error from the output back through every layer, telling each weight which way to nudge — and gradient descent takes the step.
Inside each cell, attention lets every token draw on every other; the mixed signal then passes through ordinary weighted units, and gradient descent tunes millions of those weights until the output matches the data.
Architecture decides what the network finds easy. Convolutions bake in the structure of images, recurrence the order of sequences, attention the long-range links in language — each a different prior about the world.
Scale turned the recipe into a revolution. The same gradients and the same layers, fed enough data and compute, stopped being a curiosity and started writing, drawing, and reasoning.
And it scales: with more data, compute, and parameters, the same recipe keeps improving — which is why it now underpins most of modern AI.