Import AI 455: AI… | Jack Clark from Import AI Summary

Summary for Investors & Entrepreneurs

1. Key Themes

Theme 1: The Coding Singularity — AI Has Already Automated Core Software Engineering

AI's ability to write, test, and debug code has gone from novelty to near-complete benchmark saturation in under three years, crossing a threshold that makes AI a genuine substitute for large portions of human engineering work.

"When SWE-Bench launched in late 2023 the best score at the time was Claude 2 which had an overall success rate of ~2%. Claude Mythos Preview gets 93.9%, effectively saturating the benchmark."

"The vast majority of people I meet at frontier labs and around Silicon Valley now code entirely through AI systems. Increasingly, they use AI systems to write the tests and check the code as well."

Theme 2: Exponential Expansion of Autonomous Task Duration — The METR Time Horizon

The length of time AI systems can operate independently — without human re-calibration — has grown by roughly 24x in under four years, with a clear trajectory toward full workday and multi-day autonomy.

"In 2022, GPT 3.5 could do tasks that might take a person about ~30 seconds. In 2023, this rose to 4 minutes with GPT-4. In 2024, this rose to 40 minutes (o1). In 2025, it reached ~6 hours (GPT 5.2 (High)). In 2026, it has already risen to ~12 hours (Opus 4.6)."

"Ajeya Cotra, a longtime AI forecaster who works at METR, thinks it isn't unreasonable to expect AI systems to do tasks that take ~100 hours by the end of 2026."

Theme 3: AI Is Beginning to Automate AI Research Itself

Across multiple distinct sub-tasks of AI R&D — fine-tuning models, optimizing training code, reproducing papers, and designing GPU kernels — AI systems are rapidly approaching and in some cases exceeding human-level performance.

"Claude Opus 4 achieved a 2.9× mean speedup in May 2025; this rose to 16.5× with Opus 4.5 in November 2025, 30× with Opus 4.6 in February 2026, and 52× with Claude Mythos Preview in April 2026. To calibrate on what these numbers mean, it is expected to take a human researcher 4 to 8 hours of work to achieve a 4x speedup on this task."

"An Anthropic researcher primes a team of individual AI agents with a research direction, then they autonomously go and try to get a better score than a human baseline on an AI safety research problem (specifically, scalable oversight). The approach works, with the AI agents coming up with techniques that beat the Anthropic-designed baseline."

Theme 4: The Emergence of a Capital-Heavy, Human-Light "Machine Economy"

As AI capability compounds, the structural economics of industries will shift away from labor toward compute ownership and AI service spending, potentially forming an autonomous commercial ecosystem.

"We should expect for an increasing chunk of the economy to get colonized by a new generation of companies which are either capital-heavy (because they own a lot of computers), or opex-heavy (because they spend a lot of money on AI services which they build value on top of), and relatively light on labor compared to today's corporations."

"Eventually, it may be possible to see the emergence of fully autonomous corporations that are run by AI systems themselves, which would exacerbate all of the above issues, while also posing many novel governance challenges."

Theme 5: Recursive Self-Improvement Has a Credible Near-Term Timeline

Clark assigns a 60%+ probability to fully automated AI R&D — where a frontier model trains its own successor without human involvement — by end of 2028, representing perhaps the most consequential near-term technological threshold in history.

"I reluctantly come to the view that there's a likely chance (60%+) that no-human-involved AI R&D — an AI system powerful enough that it could plausibly autonomously build its own successor — happens by the end of 2028."

"I think we could see an example of a 'model end-to-end trains its successor' within a year or two — certainly a proof-of-concept at the non-frontier model stage."

2. Contrarian Perspectives

Perspective 1: AI Doesn't Need to Be "Creative" to Trigger Recursive Self-Improvement

The popular assumption is that AI must achieve transformative creative breakthroughs (like discovering the transformer architecture) to automate its own development. Clark argues the opposite — most AI progress is unglamorous engineering iteration, and AI is already excellent at exactly that.

"Very little of this requires extremely out-of-leftfield insights and a lot of it seems more like unglamorous 'meat and potatoes' engineering work... Even if AI systems are relatively uncreative, it feels safe to bet they can push themselves forward."

Invoking Edison: "Genius is 1% inspiration and 99% perspiration." The implication is that the 99% is already automatable, and that alone may be sufficient to trigger self-improving cycles.

Perspective 2: Market Incentives Alone Will Not Optimize AI's Societal Upside

Against the standard market-optimist view that capital allocation will naturally direct AI to its highest-value uses, Clark argues that compute scarcity combined with default commercial incentives will likely produce suboptimal social outcomes.

"Assuming that demand for AI continues to outstrip compute supply, we'll have to figure out where to allocate AI to maximize a social upside. By default, I am skeptical that market incentives guarantee us the best societal upside from limited AI compute."

Perspective 3: Alignment Risk Compounds Geometrically, Not Linearly

Most public discourse treats alignment as a solvable engineering problem. Clark highlights that even a 99.9%-accurate alignment approach degrades catastrophically under recursive improvement — not gradually, but exponentially.

"Unless your alignment approach is '100% accurate' and has a theoretical basis for continuing to be accurate with smarter systems, then things can go wrong quite quickly. For example, your technique is 99.9% accurate, then that becomes 95.12% accurate after 50 generations, and 60.5% accurate after 500 generations. Uh oh!"

3. Companies Identified

Anthropic

Description: Frontier AI lab, maker of the Claude model family
Why mentioned: Multiple benchmark results cited (SWE-Bench saturation with Claude Mythos Preview; LM training optimization showing 52x speedup; automated alignment research proof-of-concept); also publishing automated AI R&D research
Quote: "Anthropic is publishing work on building automated alignment researchers."

OpenAI

Description: Frontier AI lab, maker of GPT and o-series models
Why mentioned: MLE-Bench creator; o1 and GPT-5 results cited in METR time horizon data; stated corporate goal of automated AI research intern
Quote: "OpenAI wants to build an 'automated AI research intern by September of 2026.'"

Google DeepMind

Description: Frontier AI lab within Alphabet
Why mentioned: Gemini models cited in Erdős problem-solving and centaur math discovery; cited as endorsing (cautiously) automation of alignment research
Quote: "DeepMind appears to be the most circumspect of the big three, but still says 'automation of alignment research should be done when feasible.'"

DeepSeek

Description: Chinese AI lab known for open-weight models
Why mentioned: Models used to build better GPU kernels as an example of AI-driven kernel design R&D
Quote: "Using DeepSeek's models to try to build better GPU kernels."

Meta

Description: Technology conglomerate with major AI research division
Why mentioned: Cited for using LLMs to automate generation of optimized Triton kernels within its own infrastructure
Quote: "Meta using LLMs to automate the generation of optimized Triton kernels for use within its infrastructure."

Recursive Superintelligence

Description: AI startup focused on automating AI research
Why mentioned: Raised $500M specifically targeting the automated AI R&D goal — a direct market signal of capital concentration in this thesis
Quote: "Recursive Superintelligence just raised $500m with the goal of automating AI research."

Mirendil

Description: Neolab / early-stage AI startup
Why mentioned: Cited as another new entrant with the explicit mission of building systems that excel at AI R&D
Quote: "Another neolab, Mirendil, has the goal of 'building systems that excel at AI R&D.'"

METR

Description: AI safety and evaluation organization
Why mentioned: Produces the "time horizons" benchmark tracking how long AI systems can independently complete tasks — one of the article's two primary evidentiary pillars
Quote: "METR makes a plot that tells us about the complexity of tasks AIs can complete, measured by how many hours a skilled human would take to do them."

4. People Identified

Jack Clark

Description: Co-founder of Anthropic; author of Import AI newsletter
Why mentioned: Author of this essay; provides the central thesis and probabilistic forecast
Quote: "I reluctantly come to the view that there's a likely chance (60%+) that no-human-involved AI R&D... happens by the end of 2028."

Ajeya Cotra

Description: Longtime AI forecaster; works at METR
Why mentioned: Cited for her projection that AI systems capable of 100-hour autonomous tasks are achievable by end of 2026 — a key near-term milestone
Quote: "Ajeya Cotra, a longtime AI forecaster who works at METR, thinks it isn't unreasonable to expect AI systems to do tasks that take ~100 hours by the end of 2026."

5. Operating Insights

Insight 1: Delegate Increasingly Complex Work to AI Agents — The Time Horizon Is Your Guide

The METR time-horizon data gives operators a practical framework: if a task takes fewer hours than the current AI autonomy threshold (~12 hours as of early 2026, heading toward 100), it is a strong candidate for full AI delegation today. Smart operators should be auditing their workflows against this expanding threshold on a rolling basis.

"If you look closely at the work of many AI researchers, a lot of their tasks boil down into things that might take a person a few hours to do — cleaning data, reading data, launching experiments, etc. All of this kind of work now sits inside the time horizon scope of modern systems."

Insight 2: Treat AI-Managed Multi-Agent Teams as a Scalable Organizational Unit

The emergence of AI systems that can manage other AI systems — acting as directors, critics, and editors — means entrepreneurs can now stand up synthetic team structures for complex, parallelizable work without proportional headcount growth.

"AI systems are also learning to manage other AI systems... a single agent can end up supervising multiple sub-agents. This allows AI systems to work on large-scale projects that require multiple individual 'workers' each with different specialisms that work in parallel, typically under the direction of a single AI manager."

Insight 3: Watch Kernel Optimization as a Leading Indicator of AI Infrastructure Leverage

Companies that deploy AI to optimize their own compute infrastructure (GPU kernels, training pipelines) are compounding their cost and speed advantages over those that don't. This is already happening at Meta and frontier labs, and the tooling is becoming accessible.

"Kernel optimization is core to AI development because it defines the efficiency of both training and inference — how much compute you can effectively utilize to develop an AI system, and once you've trained a model, how efficiently you can convert that compute into inference."

6. Overlooked Insights

Insight 1: "Amdahl's Law for the Economy" — Physical-World Bottlenecks Will Create Asymmetric Winners and Losers

Clark briefly introduces an analogy to Amdahl's Law (the principle that the slowest component limits overall system speed) applied to AI-driven economic acceleration. Sectors where digital speed meets physical-world constraints — drug trials, supply chains, infrastructure permitting — will become the rate-limiting step in AI-driven productivity, creating concentrated, durable competitive moats for whoever solves those bottlenecks first.

"We'll discover places where things break or slow under the increased volume, and we'll need to figure out how to fix those weak links in the chain. This may be especially pronounced in areas where you have to reconcile the fast-moving digital world with the slow-moving physical world, like drug trials for new medical therapies."

Insight 2: PostTrainBench — AI Is Already Halfway to Replacing Human Post-Training Researchers

Buried in the technical evidence section is a striking and underreported data point: AI systems are already achieving roughly half the performance uplift of expert human researchers on model fine-tuning tasks, as of March 2026. This is a direct, quantified threat to one of the most specialized and high-value job categories in AI — and a potential unlock for labs or startups that can't afford top-tier post-training talent.

"As of March 2026, AI systems are able to post-train models to get about half as much of the uplift as ones trained by humans... The top-scoring systems as of April get 25%-28% (Opus 4.6, and GPT 5.4), compared to a human score of 51%. This is already quite meaningful."