Behnam Neyshabur

1. Key Themes

Self-Accelerating AI as a Distinct Paradigm Beyond Current LLM Scaling

The core thesis of Mirandil is that the next frontier isn't simply bigger models but systems that can improve themselves — what they call "self-accelerating AI." This is defined not as sci-fi recursive self-modification but as an ecosystem of models and humans that develops the next generation of AI as part of its operation.

"Our version of self-accelerating AI is can these systems do the work of an AI researcher or an AI engineer? Like everything from the low-level kernels to works to conducting research itself in a very high throughput way." [00:07:22.040]

The Business Model Misalignment at Frontier Labs Is Structural, Not Cultural

Behnam makes a pointed structural argument: the incentive model of large AI labs (train a big model, charge for access) is fundamentally incompatible with broadly distributing self-accelerating AI capabilities. This isn't fixable by leadership — it's baked into the economics.

"If the business model of the company is I train a big model and charge people for using it, how is this company incentivized to share this technology with everyone else? Because that's directly letting everyone else train a model which reduces their dependency to the company. So these are so fundamental that it doesn't matter who runs the company." [00:15:52.780]

AI-to-AI Research Assistance Is Already Causing Self-Reinforcing Progress Loops

The hosts and guests converge on the observation that self-acceleration is already quietly happening — every generation of AI coding tools speeds up the creation of the next generation of models, creating a compounding dynamic unlike any prior general-purpose technology.

"It has picked up by kind of like every generation of kind of like new models, and it improves everyone's productivity by itself. But that's also kind of like improves the time to the next breakthrough, next model." [00:34:18.220]

"What you guys are doing is almost the next level of cheat code because like the actual experiments are sort of... when the internet gets faster, you don't like recursively get faster internet." [00:34:45.480]

The Real Bottleneck Is System Scaling, Not Individual Agent Performance

Behnam identifies a profound unsolved problem: companies don't scale productively even with humans, and AI agent swarms have the same problem. The question of how to get favorable scaling — where doubling agents actually halves time-to-goal — is the central technical and organizational challenge of the AI era.

"Companies scale and their productivity go down with scale. So people don't have a favorable scaling. So you use your company size goes 10x and your productivity is like 1.2. So that's not great. And today agents are also not that amazing in terms of like being able to actually scale them in a way that you can see that productivity of the entire system grows." [00:31:09.660]

"A lot of these companies are willing to pay 10x more compute to just get to something like one month faster than others. And I think that's, you have to solve the scaling problem." [00:31:34.320]

Mirandil Is Building to Return AI Sovereignty to Individual Businesses and Labs

Rather than creating further dependency on frontier labs, Mirandil's product philosophy is to let any organization own its own AI stack — trained on their own data, running on their own infra — analogous to how Claude Code democratized coding.

"The way we are thinking about the product and how we are forming relationship with the rest of the world is how can we enable businesses to start owning more pieces, to have their own infra, their own AI... what we are noticing today is that as a result of creating more dependency to these big AI labs, they're gradually losing control and losing bigger parts of the business." [00:22:18.910]

"The future we see is the same happening to AI where everyone can go from their wish to have AI do something for them to seeing that happen for like their own AI." [00:24:26.470]

Oversight Reduction Is the Key Leverage Point for Agentic Productivity

Harsh makes the observation that even marginal reductions in required human oversight unlock enormous amounts of autonomous token spend and productive output. The improvement from Sonnet 3.5 through 4.5 has been material precisely in this dimension.

"The jump from Sonnet 3.5, 4.0, 4.5 has been materially better in terms of like where does it get stuck? Where does it need oversight? Even the tiniest reduction in oversight can lead to like large amounts of token spent and just effective outcomes being generated." [00:11:08.820]

Scientific Grand Challenges Require Superhuman, Not Just Human-Level, AI

The founders argue that framing AI as an "AI scientist" that replaces one human scientist fundamentally misunderstands what's needed. Problems like Alzheimer's disease require speed and throughput that is categorically beyond what even the best AI-as-scientist approach can provide at current trajectories.

"These are superhuman problems. So what does it take to solve these problems?... We think it's going to be just a matter of time before they want to lean into this angle more... like Alzheimer's disease, for example, it has so much structure in terms of data that you want, you have to use, that you cannot even see how in 10 years existing models would be able to kind of move things much faster." [00:08:19.260] ... [00:39:14.360]

The Smallest Possible AI Team Is the Most Powerful Research Signal

Mirandil is explicitly running itself as a proof-of-concept scaling experiment: can you produce frontier-quality AI research with 10x fewer people? Their early results suggest yes, and this is itself the product demonstration.

"We've been able to do a lot with a lot fewer resources and people. We've been at the frontier labs for a long time and we know how much work it is to do something. And we've been able to do it in maybe 10 times less people and less resources. That was surprising to me." [00:26:16.320]

2. Contrarian Perspectives

Restricting AI Research Assistance Is Driven by Competitive Incentives, Not Safety

Behnam makes the subtle but damning point that when Anthropic (and other labs) restrict AI from helping with "AI training," it's nearly impossible to separate genuine safety concern from competitive self-interest. The labs themselves are the ones most threatened by democratizing this capability.

"It's very hard to separate incentives from reasons. And it's just get melted. It's just hard to separate it because you're also deeply incentivized to keep your distance with others." [00:19:24.360]

Pre-Training Scaling Is Just One Tool, Not the Path to Scientific Discovery

Rather than treating continued pre-training scaling as the default strategy, Behnam reframes it as one item in a toolbox — and argues it's the wrong tool for generating genuinely new scientific knowledge.

"Sometimes I think of pre-training, post-training, and all the paradigms that we have of using compute effectively as tools in my toolbox. And at certain points in time, some tool is effective and we go for, for the fixed amount of compute that we have, we go for the best tool." [00:38:16.680]

The Narrative of AI Automating Jobs Away Is a Choice, Not an Inevitability

Behnam argues that the current dominant narrative around AI — job displacement — is not inherent to the technology but reflects where builders are choosing to direct it. Directing it at scientific grand challenges is both more positive and more honest about AI's potential.

"The current picture that's being painted about the impact of AI is not that positive. Like automating people's job away... that doesn't seem like an exciting future... accelerating science is pure good for humanity. So that's where you could decide where to put it at work. And I think we should feel like we are in charge and we can build the future we want." [00:36:00.680]

Agent Systems Will Face the Same Org-Chart and Incentive Problems as Human Companies

Most people assume that scaling AI agents is just a compute/capability problem. Behnam argues it's an organizational design and incentive problem just as complex as building a high-performing human company — resource allocation, prioritization, and inter-agent politics will all emerge.

"The question is really, how do you get a favorable scaling of a system? Like companies scale and their productivity go down with scale... So a lot of technical problems to solve there that are very interesting... at the end of the day, with agents and with people, incentives are very important." [00:31:09.660] ... [00:33:01.820]

An AI Model as Good as the Best Scientist Still Won't Solve Alzheimer's

Even reaching human-level scientific AI is insufficient for the hardest problems. The bottleneck isn't intelligence quality — it's throughput and speed across the full problem structure, which requires a qualitatively different approach.

"Our best hope is that we're going to have models that are going to be as good as our best scientists, or maybe become better than the best scientists, but how's that going to solve Alzheimer's disease for us? We don't even know if it's possible. We don't even know what are the limits that exist. So we really need to move at much higher speed." [00:38:44.360]

3. Companies Identified

Mirandil

Early-stage AI lab focused on building self-accelerating AI systems to accelerate scientific discovery. Founded by ex-Anthropic research scientists. Building specialized models and products for AI researchers and engineers (low-level kernels, PyTorch, RL frameworks, pre-training frameworks), with the explicit goal of letting any organization own and train their own AI.

"We are thinking about the product and how we are forming relationship with the rest of the world is how can we enable businesses to start owning more pieces, to have their own infra, their own AI." [00:22:18.910]

Anthropic

Frontier AI lab; previous employer of both founders. Mentioned as the context in which the self-accelerating AI research was initiated (Harsh led the automated pre-training project; Behnam co-led the science team). Also cited as an example of structural business model misalignment for this type of technology.

"I was co-leading the science team in Anthropic. Harsh was kind of initiated at the automated pre-training project in Anthropic." [00:14:53.980]

Google / DeepMind (Blue Shift Labs)

Google's research arm, including the Blue Shift Team where Behnam worked when scaling laws were first being demonstrated at OpenAI. The team built math-specialized models and reasoning models that became precursors to Mirandil's work.

"We first built the math specialized models at Gemini and then worked on the reasoning models. And then some of the primitives kind of like started to come together when we joined Anthropic." [00:12:36.320]

OpenAI

Mentioned as the origin point for the scaling law discoveries that convinced Behnam a revolution was coming.

"A lot of reasons behind building the company comes back from when scaling law was happening at OpenAI. Back then, I was at Blue Shift Team, and that was a moment when I realized we are on the verge of a revolution." [00:02:18.400]

AlphaGo (DeepMind)

Cited as an early canonical example of self-accelerating AI via self-play loops — validating that the concept has existed in narrow forms for years.

"AlphaGo, where we can create all these self-play loops, is already self-accelerating AI." [00:06:27.240]

4. People Identified

Co-founder of Mirandil, previously co-led the science team at Anthropic, and was a founding member of the Blue Shift Team at Google. Identified as a key architect of the self-accelerating AI thesis and the scientific grand challenges framing.

"I was co-leading the science team in Anthropic... and we were both talking about what does it take to apply this to science and accelerate science." [00:14:53.980]

Harsh Mehta

Co-founder of Mirandil, previously initiated the automated pre-training project at Anthropic. Known for starting work on AI-driven AI engineering before the models were capable enough for others to believe in it.

"When we joined, the state of the art model was Sonnet 3.7. And I think I was too excited in some sense to jump in and attack this problem." [00:13:33.460]

Dario Amodei (Anthropic CEO)

Mentioned twice: for his framing of AI progress as a "smooth exponential," and for his essay "Machines of Loving Grace," which was Harsh's original inspiration for joining Anthropic.

"Dario had this essay, Machines of Loving Grace, which was very inspiring. And it really felt like the place where this kind of ambitious work can be done." [00:14:01.160]

"Dario calls it like a smooth exponential, and scale has been very helpful, both in terms of the underlying models being better, but also conducting the inference time research and engineering at a very high throughput grade." [00:13:05.800]

5. Operating Insights

Optimize the Entire Company End-to-End for the Core Goal — Including Data Practices

Mirandil appears to be feeding its own researchers' work traces and model interactions directly back into training — treating the company's own operations as a training signal. This is a non-obvious organizational design choice that most companies building AI products don't do.

"All I can say is that when you start a company with a certain goal, you optimize the entire company end to end for what you're going for." [00:25:21.610]

Use Your Own Small Team as the Scaling Experiment Before Scaling

Behnam explicitly frames Mirandil's small team not as a limitation but as a controlled experiment in favorable agent/human system scaling — they are stress-testing the methodology before deploying it at scale. Any company building AI-augmented workflows should run themselves as the first experiment.

"I think of ourselves as the experiment. You're running an experiment at a small scale, thinking about how can we get favorable scaling and then scale up the system." [00:32:03.940]

Treat Model Version Numbers as the Primary Time Reference for AI-Era Product Development

Matt Bornstein surfaces a practical insight that teams building on top of AI should take seriously: capability jumps are better tracked by model versions than calendar dates, since capability is what actually changes your product's potential.

"It's almost easier to use model version numbers as date, as like time reference points now. I can actually understand what you're indicating when you say Sonnet 3.7 versus like 2022." [00:14:01.160]

Hire for Belief in the Multiplier — Skeptical Candidates Are a Red Flag in This Era

When candidates asked how Mirandil could compete with large labs at 20 people, Harsh treated that as a disqualifying signal — the candidate didn't believe in the multiplicative effect of AI-assisted research. For AI-native companies, this is a high-signal hiring filter.

"I talk to candidates and they get surprised that, how do you think you can compete with big places with only having 20 people. And I'm like, it looks like you're not a believer in this. Have you been using? More should say that. We have this thing now that can help us." [00:26:16.680]

6. Overlooked Insights

The ARC-AGI Benchmark Is Quietly Pointing at the Right Architecture for Unknown Domains

Matt Bornstein briefly name-checks ARC-AGI as an example of a system dropped into an unknown environment that must learn the rules before optimizing — and both founders quickly affirm this as the correct mental model for self-accelerating AI in enterprise contexts. This suggests ARC-AGI-style "in-context rule discovery" is not just a benchmark curiosity but a blueprint for real-world agentic deployment in novel domains (new companies, new scientific fields, new enterprise workflows). Investors should watch whoever is leading on ARC-AGI-type generalization.

"This is sort of like the ARC-AGI benchmark, for instance, where it sort of dropped into an environment and has to like understand the rules before optimizing them. Correct. And more like inside like enterprises or general knowledge work, another example could be like a system which ends inside a company. If an employee enters a company, then they have to gain all this context to function well. And can an AI system do something like that?" [00:06:56.200]

Scientific Compute Access — Not AI Intelligence — Will Be the Next Bottleneck

Buried in Behnam's framing of the "periodic lab" model is a striking implication: once AI removes the intelligence bottleneck from scientific research, the remaining bottlenecks are physical — data access, lab infrastructure, compute, and regulatory access. This means the next wave of value creation in AI-for-science will accrue not to model builders but to whoever controls the physical and data infrastructure that scientific AI systems need to run experiments. This is an underappreciated investment thesis.

"Each of these problems need a periodic lab, which is a bunch of experts who understand the domain and are excited about attacking this problem directly... the thing that's difficult to do for a periodic lab is assemble the best frontier AI team and iterate over the AI solution. What we want to do is take that part and minimize it, make it small... Because each of these problems have a lot of other constraints to solve, like physical elements, like getting access to data and all the other bottlenecks that exist. So we just want to remove the AI bottleneck." [00:08:48.520]