Andrej Karpathy: From Vibe Coding to Agentic Engineering
- 01The December 2024 Inflection Point: When AI Coding Actually Worked
- 02Software 3.0: LLMs as a New Computing Paradigm, Not Just Faster Software
- 03Verifiability as the Hidden Architecture of AI Capability
1. Key Themes
The December 2024 Inflection Point: When AI Coding Actually Worked
Karpathy identifies a specific moment when agentic AI coding crossed a threshold from "sometimes helpful" to genuinely reliable — and argues most people missed it because they were still using ChatGPT as a search engine substitute.
"I just started to notice that with the latest models, the chunks just came out fine. And then I kept asking for more and it just came out fine. And then I can't remember the last time I corrected it." 00:01:29
"I think a lot of people experienced AI last year as chat GPT adjacent thing. But you really had to look again and you had to look as of December because things have changed fundamentally." 00:01:58
Software 3.0: LLMs as a New Computing Paradigm, Not Just Faster Software
Karpathy argues we are not in an era of accelerated software development — we are in a categorically different computing paradigm where prompts replace code, and neural nets replace explicit logic. The MenuGen example is his clearest illustration: his entire app became unnecessary when a single prompt to Gemini accomplished the same thing.
"Software 3.0 is kind of about, you know, your programming now turns to prompting. And what's in the context window is your lever over the interpreter that is the LLM." 00:03:24
"All of my menu gen is spurious. It's working in the old paradigm that app shouldn't exist." 00:06:11
"It's not just about programming and programming becoming faster. This is more general information processing that is automatable now." 00:06:40
Verifiability as the Hidden Architecture of AI Capability
The jaggedness of AI — simultaneously refactoring 100k-line codebases while failing to advise driving to a carwash — is not random. It maps directly to whether a domain had verifiable reward signals during training. This is a structural insight about where AI will and won't work out of the box.
"LLMs can easily automate what you can verify... they are given verification rewards. And then because of the way that these models are trained, they end up basically progressing and creating these like jagged entities that really peak in capability in kind of like verifiable domains like math and code." 00:10:19
"If you're in the circuits that were part of the RL, you fly. And if you're in the circuits that are out of the data distribution, you're going to struggle." 00:13:05
2. Contrarian Perspectives
Most AI Apps Being Built Today Shouldn't Exist
While the world is rushing to build AI-powered applications, Karpathy argues many of these apps represent old-paradigm thinking — unnecessary middleware between a user and a neural net that could just handle it directly.
"A lot of this code shouldn't exist and it's just neural networks doing most of the work... The software 3.0 paradigm is a lot more kind of raw. It just your neural network is doing more and more of the work and your prompt or context is just the image and the output is an image and there's no need to have any of the app in between." 00:08:01
The CPU Is Becoming the Co-Processor, Not the Host
Most people assume neural networks run on top of CPUs. Karpathy flips this: the endgame is neural nets as the primary compute substrate, with CPUs as a legacy co-processor for deterministic tasks.
"A lot of this will flip and that the neural net becomes kind of like the host process. And the CPUs become kind of like the co-processor... what's really running the show is these neural nets that are networked in a certain way." 00:09:25
AI Capability is Shaped by Lab Priorities, Not Just Science — Making It Opaque and Exploitable
Karpathy argues that capability peaks are partly editorial decisions by labs about what data to include — not purely emergent from scale. This means there are meaningful capability gaps in verifiable domains that labs simply haven't prioritized yet, and those gaps represent real white space.
"From GPT 3.5 to GPT 4, people noticed that chess improved a lot... a huge amount of like data of chess made it into the pre-training set. And just because in the data distribution, basically the model improved a lot more than it would just by default. So someone at OpenAI decided to add this data." 00:12:35
"There are some very valuable reinforcement learning environments that people could think of that I think are not part of the... I don't want to give away the answer." 00:14:36
"You Can Outsource Your Thinking, But You Can't Outsource Your Understanding"
Against the popular narrative that AI makes deep learning obsolete, Karpathy argues understanding remains the fundamental bottleneck — because you cannot direct agents toward worthwhile goals without it.
"You can outsource your thinking, but you can't outsource your understanding... I feel like I'm becoming a bottleneck of just even knowing what are we trying to build? Why is it worth doing? How do I direct my agents?" 00:28:10
3. Companies Identified
Vercel Cloud deployment platform. Mentioned as a friction point — Karpathy deployed MenuGen on Vercel but found the manual configuration of services, DNS, and menus painful. The implication is that Vercel and similar platforms are not yet agent-native, representing an opportunity or vulnerability.
"Deploying it in Vercel because I had to work with all these different services and I had to string them up and I had to go to their settings and the menus and, you know, configure my DNS. And it was just so annoying." 00:26:47
Anthropic (Claude / Opus) Referenced as the state-of-the-art coding model, simultaneously demonstrating extraordinary capability and jarring failure modes.
"How is it possible that state of the art Opus 4.7 will simultaneously refactor a hundred thousand line code base or find zero day vulnerabilities and yet tells me to walk to this car wash?" 00:11:40
4. People Identified
Sam Altman CEO of OpenAI. Referenced for an observation about generational differences in AI usage that contextualizes why many experienced professionals underestimated the December 2024 shift.
"People of different generations use ChatGPT differently. So if you're in your 30s, you use it as a Google search replacement. But if you're in your teens, ChatGPT is your gateway to the internet." 00:17:28
5. Operating Insights
Hiring for Agentic Engineering Requires a Completely Different Interview Format
The standard technical interview (puzzles, leetcode-style problems) is obsolete for evaluating the skill that actually matters now. Karpathy proposes a project-scale evaluation that includes adversarial AI stress-testing as the verification mechanism.
"I would say that hiring has to look like give me a really big project and see someone implement that big project... I'm going to use 10 codex 5.4 x high to try to break your website that you deployed. And they're going to try to basically break it and they should not be able to break it." 00:19:17
Spec-First Development: Design Deeply Before Delegating to Agents
Karpathy's key lesson from agent failures (like the Stripe/Google email mismatch bug) is that agents fail at the design level, not just execution. The human's job is now thorough specification, not code review.
"You have to work with your agent to design a spec that is very detailed. And maybe it's basically the docs and then get the agents to write them. And you're in charge of the oversight and the top level categories, but the agents are doing a lot of the under the hood." 00:21:01
"You're in charge of the taste, the engineering, the design, and that it makes sense and that you're asking for the right things." 00:22:20
Build Personal LLM Knowledge Bases as an Organizational Intelligence Layer
Rather than using AI reactively for queries, Karpathy uses LLMs to continuously recompile ingested information into a personal wiki — treating it as a way to force genuine understanding and not just retrieval.
"I have my, you know, my wiki that's being built up from these articles. And I love asking questions about things... these are tools to enhance understanding in a certain way." 00:29:02
6. Overlooked Insights
There Is a Specific, Unnamed High-Value Verifiable Domain That Karpathy Deliberately Declined to Share
This is the most signal-rich moment in the entire conversation. Karpathy was mid-sentence identifying a specific high-value RL environment that is not yet being targeted by labs — and he stopped himself.
"I do think there are some very valuable reinforcement learning environments that people could think of that I think are not part of the, yeah, I don't want to give away the answer. But there is one domain that I think is very, oh, okay, sorry." 00:15:04
This is not a throwaway comment. Karpathy is describing an investment and company-building thesis: a verifiable domain where (a) the reward signal is clear, (b) the labs haven't prioritized it, and (c) fine-tuning could produce a dramatically superior specialized model. The competitive moat would be the proprietary RL environment itself, not the model weights. Any investor or founder who can identify what domain Karpathy had in mind here has a meaningful head start.
All Infrastructure Is Still Built for Humans — This Is the Largest Near-Term Rebuild Opportunity
Karpathy throws this out almost as a complaint, but it contains a significant infrastructure investment thesis: every piece of developer tooling, deployment platform, documentation, and service API will need to be rebuilt as agent-native. The friction he experienced deploying MenuGen is universal and systemic.
"I still use most of the time when I use different frameworks or libraries or things like that, they still have docs that are fundamentally written for humans... What is the thing I should copy paste to my agent? Like, so it's just every time I'm told, you know, go to this URL or something like that. It's just like, oh, you know." 00:25:56
"I would hope that Menugen, that I could give a prompt to an LLM, build Menugen and that I didn't have to touch anything and it's deployed in that same way on the internet. I think that would be a good kind of a test for whether or not a lot of our infrastructure is becoming more and more agent native." 00:27:14
The companies that rebuild their infrastructure, APIs, and documentation to be agent-first — rather than bolting on AI as a feature — will structurally displace incumbents who don't. This applies across deployment, DNS, payments, authentication, and any multi-step configuration workflow.