Nested Learning: The Illusion of Deep Learning Architectures

Nov 24, 2025

I need to eat my words.

I just explained that LLMs are static after training and can’t truly adapt to new information after training.

Google’s new paper “Nested Learning: The Illusion of Deep Learning Architectures” shows a path beyond that limitation.

The analogy.

The authors compare LLMs to patients with a form of amnesia, where they remember their distant past (from training data) and can hold a conversation (current context), but nothing new sticks.

The key shift.

We know that the ‘deep’ in deep learning comes from stacking layers.

The paper argues that depth comes from nesting learning systems at different timescales, kind of like the human brain’s use of different wave frequencies.

Like how we process information - this sentence (milliseconds), this conversation (minutes), and key insights (days/years).

They propose 3 key innovations:

Optimizers as memory: We use optimizers to update and learn weights. The paper shows that some optimizers are also memory systems and extends them to be even better at memory.

Self-modifying networks: Weights in neural networks are currently static after training. The paper proposes neural networks that learn how to update themselves.

Continuum memory: Not just short/long-term, but a spectrum of memory operating at different speeds. Some update every token, others less frequently

Looks interesting. Probably need a few more reads to fully understand its approach and significance.

My updated take.

Current models are static after training. But that’s a design choice, not a fundamental limit.

Could this be the next “Attention is All You Need”?

#AI #Attention #DeepLearning

Discussion about this post

Ready for more?