Baseten CEO Tuhin Srivastava on the AI…

1. Key Themes

The Custom Model Revolution Is Already Here — And Most People Don't Know It

The dominant narrative is that enterprises are still early in AI adoption. But Baseten's workload data reveals something more advanced is already happening at the frontier. Nearly all inference being served is on customized models, not vanilla open-source weights.

"It is all custom. It's basically... 95%, and I think that's really cool, to be honest... almost all of them, the customers are making some modifications to the model with their own data specialized for the use case. And I think what's even more important is they might be compiling it in different ways. No one is just running the vanilla open-source weights." - Tuhin Srivastava 00:13:20

Inference Is the Terminal Market — Even Post-AGI

Tuhin makes a sweeping claim about the permanence of inference as a market category, arguing that it cannot be displaced by any future development, including AGI itself.

"I think we're kind of in a world that is, you know, it is the last market, right? Like even if there's AGI, all that's left is inference." - Tuhin Srivastava 00:40:12

This is supported by observed Jevons Paradox dynamics in his customer base — as inference costs fall, usage expands rather than saturates.

"The more we drive down the costs, what they realize is, more intelligence just means better... better answers, better experiences, more actions, more dollars, even more revenue." - Tuhin Srivastava 00:40:11

The Supply Crunch Is Far Worse Than Reported — And Quality of Supply Is a Hidden Crisis

Baseten operates 90 clusters across 18 clouds and still runs at "uncomfortably high" utilization — mid-90s. The second layer of the problem, largely unreported, is that many new suppliers are operationally incompetent.

"There are also a lot of suppliers right now that it's kind of grifty... they haven't run data centers before. They don't understand SLAs, especially for inference... there's probably like a dozen good clouds. And I'd probably put like three or four of them in the gold tier." - Tuhin Srivastava 00:20:40

Securing large GPU clusters now requires multi-year commitments with heavy prepayments:

"If you wanted a thousand, 1024 B200s from a good cloud right now, you're not getting that less than a three to five year contract right now with probably a 20 to 30% TCV prepay." - Tuhin Srivastava 00:22:45

2. Contrarian Perspectives

The Application Layer Will Survive Because of Workflow Depth, Not Model Quality

The conventional fear is that frontier labs will commoditize or absorb all application-layer value. Tuhin argues the opposite — companies that embed deeply into workflows own something labs fundamentally cannot replicate.

"The thing that happens inside the EMR three steps down... that becomes a workflow that only... it's very, very hard for a frontier model company to go to [a hospital] because they just don't have access to that user signal." - Tuhin Srivastava 00:02:55

The key insight: value is stored in workflows, not model weights. Companies that accumulate proprietary reward signals from those workflows can post-train specialized models that no lab can replicate without the same access.

Chinese Open-Source Models Are Inadvertently Subsidizing U.S. Innovation

While most commentators treat Chinese AI models as a threat, Elad offers a contrarian geopolitical read that Tuhin endorses:

"It looks like effectively the Chinese government is subsidizing at least a large subset of these models, and that subsidy or surplus is effectively just being passed on to U.S. enterprises who are adopting these models. In other words, it's a way for the Chinese government to effectively subsidize U.S. enterprise in an indirect manner, and I think that's a little bit lost right now." - Elad Gil 00:11:28

Tuhin further quantifies this: DeepSeek can be run at roughly 20% of the cost of closed-source alternatives with comparable or better performance.

"You could run DeepSeek probably 20% of the cost of running OpenAI/Anthropic models in production with comparable, better latency, probably better reliability." - Tuhin Srivastava 00:12:26

GPUs-as-a-Service Is Commodity; Inference Software Creates Extreme Stickiness

The prevailing assumption is that compute is the moat. Tuhin inverts this: raw compute is a commodity, but the software inference layer on top creates extraordinary retention.

"GPUs as a service is not sticky. I think that's been seen. Like customers generally just see that as commodity. Inference with the software layer included is incredibly sticky... none of our top 30 customers have ever churned. We're talking like 400% annual NDR around our business." - Tuhin Srivastava 00:24:47

Post-Training and Inference Are the Same Problem — Not Separate Disciplines

The market treats training/post-training and inference as distinct concerns handled by different vendors. Baseten's research acquisition revealed they are deeply coupled.

"It's interesting as we've started to do a lot more research on the post-training side, you start to see how linked inference and post-training are... even when you think about stuff like quantization and when you should do that and like, how you train the model affects how you need to quantize for inference. And how paired these problems are has become very apparent." - Tuhin Srivastava 00:16:35

Don't Post-Train Until You Have Proven Product-Market Fit

Against the hype of custom model development, Tuhin has a sobering corrective for founders:

"No post-training pre-product market fit is what I'd say... go prove to yourself with the best in class model that you have something worth optimizing." - Tuhin Srivastava 00:17:42

3. Companies Identified

Abridge Ambient AI scribe for physicians; deeply embedded in hospital and EMR workflows across the U.S. Cited as the exemplar of application-layer defensibility through workflow depth and proprietary clinical data.

"They've basically got this very, very deep integration into hospitals, into clinician workflows... it's very, very hard for a frontier model company to go to [a hospital] because they just don't have access to that user signal." - Tuhin Srivastava 00:03:13

Decagon AI-native customer support company. Cited as an example of a company building specialized workflow intelligence through multi-step support task sequences, creating proprietary training signal.

"A support task isn't one-shotted. Usually at a company like Base10, when a ticket comes in, there's like what, like 1, 2, 10, 20 actions that get taken. And that is where someone can develop a specialized model." - Tuhin Srivastava 00:04:01

Open Evidence AI platform for clinical evidence and medical information. Cited as a frontier AI application company reaching enterprise healthcare at scale via Baseten's infrastructure.

"The fastest-growing AI companies in the world... Abridge, Open Evidence, Decagon... we don't serve enterprises in mass. Our customers serve enterprises." - Tuhin Srivastava 00:07:17

Braintrust AI evals platform. Cited as a key integration partner in Baseten's inference-to-post-training loop strategy.

"We're going to work with the best evals company in the world to make sure that's very well integrated, like Braintrust, into and around Base10." - Tuhin Srivastava 00:29:37

Cursor AI coding assistant. Cited as an example of a scaled AI-native application company.

"The fastest-growing AI companies in the world... like the Abridge, Cursor, Open Evidence of the world." - Elad Gil 00:05:57

Paused (acquired by Baseten) Post-training research company, formerly a Baseten customer. Acquired to bring post-training expertise in-house, recognizing that inference and post-training are coupled problems.

"Paused was a company that was a Base10 customer. They were post-training models and running them on Base10... what we realized was, hey, we really needed that expertise because it represents a way for us to get closer to the customer earlier." - Tuhin Srivastava 00:14:53

4. People Identified

Danny, Samir, and Stephen Day Senior leadership hires across technical and go-to-market functions at Baseten during the 30x growth period. Cited as examples of the leadership-layer investment that enabled rapid scaling.

"You've brought in a lot of really amazing talent, like Danny and Samir and Stephen Day, folks on both the technical and the go-to-market side." - Elad Gil 00:33:49

Amir (Co-founder, Baseten) Co-founder of Baseten, deeply embedded in operational/on-call culture. Cited for embodying the infrastructure operations mindset at the company, to the point his seven-year-old knows what a P0 is.

"Amir, my co-founder, when his pager goes off, his seven-year-old said, 'Is that a P0?'" - Tuhin Srivastava 00:37:37

5. Operating Insights

Give Leaders Whole Problems — Micromanagement Is a Hiring Failure, Not a Management Style

Tuhin reframes the impulse to be involved in everything as a symptom of not having the right people, not diligence.

"If you feel like you are micromanaging, if you feel like you need to be involved in everything, I think that's a bit of a cop out as a founder. Because you're just like, I just need to be involved in everything. It's like, no, you probably don't have the right people." - Tuhin Srivastava 00:35:19

Define Your Hiring Rubric in Specific, Non-Generic Terms

Vague hiring criteria like "smart and hardworking" are operationally useless. Tuhin found that specificity on values — not just skills — dramatically improved fit and retention.

"Be very, very clear what you're optimizing for... with us, what we cared about was, hey, actually, we don't care about a lot of people who have done this before. We care about first principles people... work has to be a high priority, but they also have to be very kind and nice... we don't have a hero culture... very low ego." - Tuhin Srivastava 00:35:49

Build for Your Most Demanding Customers — Enterprise Requirements Translate Upstream

Rather than trying to serve enterprise directly, Baseten serves AI-native companies that themselves sell to enterprises. This indirectly delivers enterprise-grade requirements without the enterprise sales motion.

"By serving companies like Abridge and Open Evidence, we're probably pretty well suited to go serve the healthcare system, given that they are selling to them... we actually get a full translation of what the enterprise was required." - Tuhin Srivastava 00:07:45

6. Overlooked Insights

The Cost of Capital Is Becoming a Core Competitive Moat in Inference Infrastructure

This was mentioned almost in passing during a discussion about IPO timing, but it is structurally significant. As GPU cluster contracts shift to 3-5 year terms with 20-30% TCV prepays, the ability to finance capacity is no longer just an operational concern — it is a competitive differentiator that will determine which inference players can even compete at scale.

"What becomes important when acquiring capacity is you need to have enough demand to supply it, but then you also need like a low cost of capital, which is actually changing the dynamic pretty significantly... I think you'd go [public] sooner... our business has very interesting working capital requirements." - Tuhin Srivastava 00:23:14

This implies that inference infrastructure is quietly converging toward a capital markets game, not just a technology game. Operators and investors should expect public markets or structured debt to become a meaningful strategic weapon — and that well-capitalized players (or those with access to cheap debt) will have an asymmetric advantage in locking up supply.

Exclusive Chip Supply Deals May Be Actively Destroying the Viability of NVIDIA Competitors

Tuhin briefly surfaced a self-reinforcing dynamic that explains why alternative chip providers may never achieve ecosystem scale — not because of technical inferiority, but because their go-to-market strategy structurally prevents the developer ecosystem from forming around them.

"What you need to be able to compete here is the ecosystem to form around you. And if you tie up all your supply with one buyer, which a bunch of the other chip providers have done, it's actually hard for that ecosystem to form... if you're a big lab and you have a proprietary deal with one chip type where you get 90% of the supply, it's actually in your best interest to make sure you get 95% of the supply and everything that's built for you, no one else could ever use it." - Tuhin Srivastava 00:27:48

This is a non-obvious competitive moat analysis: exclusive supply deals that appear to be wins for alternative chip vendors may actually be strategic traps — concentrating adoption so narrowly that the broader ecosystem (the thing that made CUDA unassailable) never materializes.