Elena Burger | AI + a16z Summary

1. Key Themes

The Frozen Model Problem: AI's Version of Memento

Current AI models are trained once and then frozen — they cannot update from experience post-deployment. The entire ecosystem of RAG, context windows, agent scaffolding, and system prompts are workarounds to this fundamental limitation, not solutions to it. Malika uses the Memento analogy powerfully: "AI models, honestly, it kind of maps one-to-one to how AI models work today. So we have the training phase where we basically encompass all of the world knowledge... And so the question is, honestly, the model is basically frozen. But the new experiences, new knowledge still persists." [00:04:53:600]

In-Context Learning Works — But Has a Ceiling

The honest starting point for any continual learning argument is that in-context learning genuinely works. Tools like OpenClaw (Claude) demonstrate real value through context orchestration. But the ceiling becomes apparent in adversarial security and versioned software libraries, where weight-level knowledge can't be overridden by context. "The question is not whether in-context learning works. The question is whether that's kind of the ceiling." [00:10:04:440]

The Three-Bucket Framework: Context → Modules → Weights

The piece introduces a compaction framework: non-parametric (context/RAG), semi-parametric (module/cache updates like KV caches), and fully parametric (weight updates). Most market activity is in the first bucket today. The real frontier — and investment opportunity — is in parametric, weight-level continual learning, which is still early. "The field is honestly still kind of in early states. So we have a bunch of companies that are doing kind of the RL, data and systems. And some that are basically saying that transformer architectures are the bottlenecks and we need the novel architectures." [00:12:57:020]

2. Contrarian Perspectives

The "Naive but Janky" Wins — And That's the Real Threat to Continual Learning

Most people assume more sophistication always wins. But the skeptic argument is that simple in-context orchestration is so good, it may keep winning indefinitely. "The skeptics would say, like, why complicate things, right? Like these naive but janky interfaces really tend to work, and they will continue to win because they're so fundamental." [00:07:13:080] This is a genuine headwind to the continual learning thesis.

System Prompts Are Actually Tattoos — And That's a Problem for Security

Most practitioners treat system prompts as a security layer. Malika argues the opposite: attackers have access to the same context as any user, making system prompt-based defenses structurally weak. "Imagine you try to update your system prompt to say, like, don't do this. Like, it's not going to work, right? Because all of the parameters in the model have learned to be helpful to the users. So you really have to encompass that kind of knowledge in the weights." [00:09:05:200] Security is therefore a forcing function for weight-level learning — a non-obvious driver of the entire field.

True Discovery Cannot Be Derived From Pretraining Data Alone

The consensus view is that more data and better retrieval gets you to better outputs. But Malika argues that genuine novel discovery requires creating new techniques that didn't exist in any prior training corpus — like Andrew Wiles solving Fermat's Last Theorem. "The learning here is like a true, genuine discovery that you really could not have even learned from all of the information and whatever pre-training data that humans kind of had before." [00:08:35:900] This directly challenges the sufficiency of scaling + retrieval.

Even the Labs Don't Bet on One Approach

The narrative around AI labs is that they each have a singular architecture or learning paradigm. Malika contradicts this: "In all of the labs that we talk to, even the labs don't just tackle one approach. They actually have multiple teams that tackle continual learning through the different kind of paradigms." [00:13:23:000] This suggests the field is far more unsettled at the frontier than public positioning implies.

3. Companies Identified

OpenClaw (Claude/Anthropic-based product) Description: An AI coding/assistant tool that orchestrates context from a user's file system, creates memory, and has bash access. Why Mentioned: Cited as best-in-class example of in-context learning done well — showing that context orchestration, not just model capability, creates differentiated value. Quote: "The underlying model was available to anyone, but what's really made it a special magical moment is this kind of like orchestration of the context... OpenClaw really utilizes your file system. It creates kind of memories, right? And it even has like a special bash access." [00:07:12:920]

Cursor Description: AI-powered code editor. Why Mentioned: Called out as an effective example of in-context learning in practice. Quote: Referenced alongside OpenClaw as companies "doing in-context learning things, things in that modality that seem to be pretty effective." [00:06:41:840]

Pinecone Description: Vector database company enabling retrieval-augmented generation (RAG). Why Mentioned: Cited as a leading non-parametric/context-layer company in the continual learning stack. Quote: "That's exactly what you think about RAG, companies like Pinecone, companies that build agent harnesses." [00:11:58:040]

Letta Description: Memory scaffolding / agent harness company. Why Mentioned: Cited as a representative company in the non-parametric continual learning bucket. Quote: "Memory scaffolding like Letta, Menzero." [00:11:58:040]

Menzero Description: Memory scaffolding company for AI agents. Why Mentioned: Cited alongside Letta as building in the non-parametric continual learning layer. Quote: "Memory scaffolding like Letta, Menzero." [00:11:58:040]

4. People Identified

Malika Aubakirova Description: Partner on the AI Infrastructure team at a16z. Why Mentioned: Author of the piece "Why We Need Continual Learning," synthesized insights from top AI researchers, founders, and PhD students across multiple organized dinners to produce the framework discussed. Quote: "We organized continual learning dinners. And so honestly, this piece was shaped largely by their insights and learnings." [00:02:40:780]

Andrej Karpathy Description: Former OpenAI/Tesla AI research leader, now independent. Why Mentioned: His "auto-research" project is cited as a real-world example of in-context learning working effectively. Quote: "We see that with examples like Karpathy's auto-research project." [00:07:12:920]

Andrew Wiles Description: British mathematician who proved Fermat's Last Theorem in 1995. Why Mentioned: Used as the canonical example of genuine discovery that cannot be derived from existing knowledge — a benchmark for what true continual learning in AI should aspire to. Quote: "What Andrew Wiles did, he basically went into near isolation for seven years and had to invent new techniques to bridge basically two fields of branches of mathematics, elliptic curves and modular forms." [00:08:07:800]

Yu San (at Stanford) Description: Researcher at Stanford, also associated with the "Discover" test-time training paper. Why Mentioned: Provided the Fermat's Last Theorem analogy used in the piece, and is cited for test-time training research showing out-of-distribution learning. Quote: "The example that I really like is given by Yu San, who is currently at Stanford." [00:07:39:580] And: "The test time training done by USAN with the Discover paper that kind of makes some of the novel inventions." [00:15:57:180]

Ilya Sutskever Description: Co-founder of OpenAI and SSI, one of the most influential figures in deep learning. Why Mentioned: Referenced for a recent comment reframing AGI — arguing humans aren't AGI but learn on the job, and that this on-the-job learning is what matters most. Quote: "I come back to just what Ilya talked about just recently. And what he said was basically, like, with AGI, we almost overshot the target... Humans are not AGI, but we still learn on the job. We learn from experience. And that's what makes kind of humans kind of unique." [00:14:16:100]

5. Operating Insights

Use Breaking Software Versions as a Stress Test for Your AI Deployment

If your AI product relies on code generation or technical assistance, test it against recently deprecated APIs and breaking version changes. This is a concrete, reproducible failure mode today. "Imagine your favorite JavaScript library. Like, let's say React, right? You learn through all of your pre-training data that there is a function called X. But at some point, a new version of React comes out and turns out that it's a breaking change... No matter how much you say it in the context, you cannot just override what's the most intuitive throughout all of the model parameters." [00:10:04:440] Operators should build regression tests around these failure cases to understand exactly where their context scaffolding breaks down.

Security Teams Should Stop Relying on System Prompts as the Defense Layer

For any AI product with adversarial surface area, system prompt-based defenses are structurally insufficient — they're visible to attackers in the same way they're visible to users. "The attackers have access to your context just like any other user. And so you have to use something else, like the weights, to really tackle it." [00:09:34:960] Operators should be pressure-testing their security architecture at the weight level, not just the prompt level.

6. Overlooked Insights

The "Cartridges" Stanford Paper May Signal the Most Actionable Near-Term Architecture Shift

While the conversation focused heavily on the context vs. weights binary, Malika briefly mentioned a Stanford paper called "Cartridges" that describes a middle path: updating KV caches rather than full model weights. This semi-parametric approach could be far more commercially deployable in the near term than full weight retraining — lower cost, faster iteration, and accessible without retraining from scratch. It received almost no discussion time, but could represent the most practical on-ramp to continual learning for enterprises. "There is a great Stanford paper on this called Cartridges that kind of explains how you can update KV caches." [00:12:28:680] Investors and builders should watch this paper closely as a potential architectural foundation for the next generation of adaptive AI products.

Benchmarks for Continual Learning Don't Yet Exist — Whoever Defines Them Shapes the Market

Malika briefly noted that Berkeley researchers and lab teams are actively working on defining benchmarks for continual learning. This is easy to overlook, but benchmark-setting is historically one of the most powerful acts of market shaping in AI — whoever defines what "good" looks like controls the narrative, the evaluation stack, and often the procurement criteria. "There are researchers from Berkeley and some of the other labs that are actually working on benchmarks that will hopefully help us define what is continual learning in a better form." [00:14:44:140] The company or institution that lands the canonical continual learning benchmark could have outsized influence over which approaches — and which vendors — win.