Adarsh Hiremath (Mercor CTO) — Felicis Fireside
- 01From Zero to $500M ARR With 30–40 People: The AI-Native Efficiency Thesis
- 02General Intelligence vs. Practical Intelligence: The Missing Layer in Enterprise AI
- 03The Discover-Deploy-Improve Framework for Breaking Enterprise AI Deadlock
- 04Evals Are the X Factor: Why Subjective Enterprise Outputs Are Hard to Automate
- 05AI-Assisted Cyberattacks Are an Underappreciated Systemic Threat
- 06Human Steering Is the Durable Competitive Moat Inside AI Deployments
1. Key Themes
From Zero to $500M ARR With 30–40 People: The AI-Native Efficiency Thesis
Mercor reached $500M in revenue run rate with only 30–40 employees — a ratio that would have been unthinkable in prior generations of software companies. This was made possible by the founders being the first customers of their own enterprise agent platform and deploying AI agents across every internal function before selling externally.
"You famously were one of the fastest growing companies of all time and gone from zero to 500 million of revenue with just 30, 40 people. That's an incredible accomplishment. That was probably unthinkable for people in my generation to even contemplate bringing companies that fast and at that efficiency." 00:13:02
General Intelligence vs. Practical Intelligence: The Missing Layer in Enterprise AI
The core reason enterprise AI deployments fail is not model capability — it's the absence of organizational context. A frontier model reviewing a resume will fail without knowing the company's talent philosophy, what "good" looks like for a specific role, and how the hiring manager actually screens candidates.
"There's a huge, huge difference between general intelligence and practical intelligence, especially in an enterprise context... if you take a frontier model, put it in a recruiting department to review a resume, that deployment will fail unless that model or LLM knows what does good look like in the context of that company." 00:05:36
The Discover-Deploy-Improve Framework for Breaking Enterprise AI Deadlock
Enterprises are stuck in a loop of failed deployments because they skip a critical prerequisite: systematically discovering and digitizing tribal knowledge before deploying agents. Mercor's three-step framework — Discover, Deploy, Improve — addresses this sequentially.
"To break out of that loop, you need three things to come together... Discover means a very, very precise understanding of what is the actual job that needs to be done... Thing two is actually deploy, where oftentimes the best deployment solution needs to be model agnostic because different models will actually leapfrog in capabilities over time... And then the third thing is improvement, which is where the evals come in and the continuous learning loop comes in." 00:10:52
Evals Are the X Factor: Why Subjective Enterprise Outputs Are Hard to Automate
Unlike code, which can be verified with unit tests, most enterprise knowledge work outputs — slide decks, memos, legal briefs — are subjective. Evals that can grade outputs against both objective and stylistic human preferences are what close the loop and enable continuous improvement of agents.
"Unlike coding where you might need to pass a unit test, there are infinitely many different ways to permute content on a slide or even format a slide. So the X factor in all of this is evals, where the benchmarks like APEX measure the full set of things." 00:08:57
AI-Assisted Cyberattacks Are an Underappreciated Systemic Threat
The LiteLLM hack that hit Mercor (and Claude and others) was a preview of a qualitatively new threat: AI models connected to arbitrary environments that can programmatically identify, iterate, and exploit vulnerabilities at machine speed. This is not just a Mercor issue — it is an industry-wide challenge that model builders have not yet adequately addressed.
"You have these AI-assisted attacks where you can connect one of these frontier models to any arbitrary environment. You blink and then you open your eyes and it has identified vulnerabilities programmatically, can iterate through all of those vulnerabilities, and then exploit them. And I think in some ways that is a really, really scary future." 00:17:12
Human Steering Is the Durable Competitive Moat Inside AI Deployments
Rather than framing AI as a human replacement, Hiremath argues humans are required to encode the idiosyncratic, organizational standards that make an agent actually useful — and that this role becomes more, not less, important as models get more capable.
"It's not just can the model make slides. It's can the model make slides that align with the organizational standards of slide making in a specific consulting firm, that align with what good looks like in a specific team, and what a partner thinks good is like." 00:19:25
Consulting as a Proxy Domain for All Complex Knowledge Work
Hiremath uses management consulting as a benchmark domain because it requires reasoning over ambiguous problems across any domain and producing subjective outputs — making it an ideal stress test for agents that must generalize across enterprise knowledge work at large.
"At its core you take an ambiguous problem that could be in any domain and then you're responsible for actually reasoning over that problem and then producing a work output... that underlying problem I think is applicable to many, many different types of knowledge work." 00:08:57
2. Contrarian Perspectives
The Enterprise AI Moment Has NOT Happened Yet — And the Reason Is Structural, Not Technological
While most of the market focuses on model capability improvements as the bottleneck, Hiremath argues the models are already capable enough. The real gap is the absence of systems to extract, digitize, and make actionable the tribal knowledge that lives in people's heads — and no company has solved this yet.
"No provider and no company has done that today... the reason that these enterprise deployments is so, so difficult is because you need to actually take that enterprise context, which can often be tribal knowledge within a company, and put that into the specification of the agent." 00:05:36
Executive Frustration With AI Deployments Is Justified, Not Irrational
Conventional wisdom blames enterprise executives for being slow adopters. Hiremath flips this: the executives are right to be disappointed because the deployment methodology itself is broken — it is guesswork, not engineering.
"I actually don't blame these executive stakeholders for being really, really disappointed in these deployments... there's a long running history of these deployments not working, everything from manual processes to task mining to RPA, all of these different things." 00:10:52
Being a 19-Year-Old Dropout With No Work Experience Was a Feature, Not a Bug in Early Hiring
Mercor's earliest hires had to be a "different type of crazy" — people willing to bet on founders with no professional track record. Hiremath treats this as a talent segmentation insight: different stages of company maturity require fundamentally different risk profiles in candidates, and conflating them is a systematic hiring error.
"You have to be a different type of crazy to join three 19-year-olds who have never worked a job before, who are working out of their apartment... You legitimately have to be some level of crazy and have a huge risk appetite. Like that is not so true anymore with the progress of the company." 00:13:37
Model Agnosticism Is a Strategic Requirement, Not a Nice-to-Have
Most enterprise AI vendors are optimizing for a specific model provider. Hiremath argues this is the wrong architecture because models leapfrog each other in capability over time, and locking into one model today means you will be locked out of better models tomorrow.
"The best deployment solution needs to be model agnostic because different models will actually leapfrog in capabilities over time, and you want a solution that is actually agnostic to all those things and allows you to pick the best model for the best use case." 00:10:52
3. Companies Identified
Mercor
AI platform for training, deploying, and evaluating models and AI agents; serves both AI labs and Fortune 500 enterprises. Mentioned as the central company of discussion — reached $500M revenue run rate with ~30–40 employees; built the APEX benchmark for evaluating agent performance on management consulting tasks; 90% of customer support tickets resolved by AI with higher CSAT than human agents; two-person IT team handles 400-person company with agent support.
"If you actually message support at mercor.com, there's a 90% chance that an AI agent will resolve that entire ticket with a higher CSAT than a human." 00:15:13
Anthropic (Claude / Claude Code)
AI lab; creator of the Claude family of models and Claude Code. Mentioned as a frontier model provider whose Claude Code product was a viral developer "aha moment," and separately as one of the companies breached in the LiteLLM hack.
"I remember in early February when I was using Claude Code and Codex, the most recent releases at the time, I just had this moment of, like, wow... with a single prompt, like a barely grammatically correct sentence, I'm seeing this model in front of me just do the whole thing that I would have spent days doing." 00:03:31
OpenAI (GPT / Codex)
AI lab; creator of GPT and Codex. Mentioned as a frontier model provider alongside Claude and Gemini as part of the set of capable models that enterprises are nonetheless failing to deploy effectively.
"We know the models are so, so capable. GPT, Claude, Claude Code, Codex, all of these different models, the Gemini models. But despite their capability, we just see humans in an enterprise doing manual work that they should not be doing." 00:03:31
Google DeepMind (Gemini)
AI lab; creator of the Gemini model family. Mentioned alongside GPT and Claude as one of the frontier models whose capabilities already exceed what enterprises are actually leveraging.
"GPT, Claude, Claude Code, Codex, all of these different models, the Gemini models. But despite their capability, we just see humans in an enterprise doing manual work that they should not be doing." 00:03:31
Bridgewater Associates
Macro hedge fund. Mentioned as Hiremath's dream internship that he reneged on to found Mercor — notable as a cultural signal of the caliber of person Mercor's founders were before founding the company.
"After sophomore year, I reneged on my summer internship at the time, which was Bridgewater, and that was my dream internship because I was a debater, and all the best debaters went to Bridgewater for internships and to work full-time." 00:00:47
4. People Identified
Adarsh Hiremath
Co-founder and CTO (recently also co-CEO) of Mercor. Highlighted throughout as an exceptionally high-signal founder: dropped out of Harvard after sophomore year, co-founded Mercor at 19, scaled the company to $500M revenue run rate with ~30–40 people, personally uses the enterprise agent product as his first daily touchpoint, and is driving Mercor's expansion into Fortune 500 enterprise deployments.
"I make all our agents on the Mercore Enterprise platform. My first thing I do when I wake up is I talk to my personal agent. When I go to sleep, I also talk to my personal agent. In between those two times when I'm doing a task that I shouldn't be doing, I go tell my agent to go do the task." 00:15:13
Brendan (Mercor Co-founder)
Co-founder of Mercor, mentioned by the host as sharing Hiremath's philosophy of keeping humans central to AI deployment rather than displacing them.
"I always see you and Brendan always talking about putting humans in the middle of AI, as opposed to putting AI in the middle of humans, you know, like most other AI leaders are talking about." 00:18:48
5. Operating Insights
Context Engineering Is the New Prompt Engineering — and It's a Full-Time Job
Mercor's customer support team doesn't spend time answering tickets — they spend the bulk of their time doing "context engineering": updating the agent's knowledge base as new features ship, so the agent can handle new ticket types without human escalation. This is a reframe of what an AI-augmented support team actually does.
"The team actually working on that spends the bulk of their time context engineering the model as new features come out and you need new information to actually address that ticket." 00:15:13
A Two-Person IT Team Can Support a 400-Person Company With Agent Infrastructure
Mercor runs its entire IT function for a 400-person organization with two humans and an agent. This is a concrete operational benchmark that any scaling startup can use to pressure-test their own IT headcount assumptions.
"Our IT team, for example, it's a two-person team to handle the IT requirements of a 400-person company. We have an agent that handles all of that." 00:15:13
Hire for Stage-Appropriate Risk Tolerance — and Recalibrate Your Hiring Criteria at Every Inflection Point
As a company matures, the attributes you could once take for granted in early hires (extreme risk appetite, belief in unproven founders) must be explicitly screened for in the cohort that needs them, while later hires require different criteria. Failing to recalibrate produces systematic hiring mismatches.
"Because of the fast growth of the company, we've had to constantly adjust these processes at a faster cadence than we normally would have." 00:13:37
6. Overlooked Insights
The APEX Benchmark Is a Quietly Significant Industry Infrastructure Play
Hiremath mentioned the APEX benchmark almost in passing — a benchmark Mercor built to measure agent performance on management consulting tasks, showing that without enterprise context, models fail roughly 50% of the time and require eight attempts at the same task. This is not just a marketing tool. A company that owns the dominant benchmark for evaluating enterprise agents controls the definition of "good" for the entire industry — a standard-setting position that historically creates durable leverage (think S&P indices, FICO scores). No one in the conversation paused on the strategic significance of owning this benchmark.
"You guys have an incredible benchmark that you put out there, Apex benchmark, where you show that the task of a management consultant, you give it to a regular model without a whole lot of context, it takes eight times for it to repeat the same task, and about 50% of the time it's wrong." 00:07:54
Mercor's Lab-Side Business Is a Structural Intelligence Moat Over Every Enterprise AI Competitor
Mercor trains, evaluates, and deploys models for the frontier AI labs themselves. This means Mercor sees failure modes, capability edges, and model behaviors earlier and at greater depth than any pure-play enterprise AI vendor. When Hiremath says "we've adopted some of this technology internally," he is describing a feedback loop that no competitor who only serves the enterprise side can replicate. This upstream position is mentioned only briefly but is arguably the most defensible part of the entire business model.
"When we work with the labs to help train, deploy, and evaluate these agents... what ended up happening at the beginning of the year is enterprises started coming up and saying, hey, we want your help training, deploying, and evaluating agents... the evals come in and the continuous learning loop comes in... we've developed deep expertise in from serving the labs and also adopting some of this technology internally." 00:06:45