Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]
- 01The Entire AI Hardware Stack Was Built Before ChatGPT
- 02Inference Will Become the Largest Market in the World
- 03The Two Core Technical Bets: Low-Voltage Inference and Cluster-Scale Memory
- 04Production Is the Product
- 05Parallelization and Prefetching as an Organizational Operating System
- 06Bimodal Talent Strategy: Legends Paired With Naïve Geniuses
1. Key Themes
The Entire AI Hardware Stack Was Built Before ChatGPT — and That's the Opportunity
Etched's core insight is that every existing AI chip was designed for a pre-ChatGPT world and is merely retrofitted for modern inference workloads. This isn't a minor gap — it's the foundational thesis for the entire company.
"It seemed like the hardware was all designed before ChatGPT. Every GPU, every TPU, every AI chip that was serving these models were just fundamentally built before this and are retrofit to serve these modern models. There's going to be an entire new wave of architectures that came out." — Gavin Uberti 00:22:51
Inference Will Become the Largest Market in the World
Both founders repeatedly frame this not as a chip company but as a bet that producing tokens at massive scale will define the most valuable companies of the next decade.
"We are on a decade march for inference to become the biggest market in the world. So when you think about that, ten years from now, there's going to be these giant projects where everything in that data center fundamentally hasn't been designed today." — Gavin Uberti 00:23:49
The Two Core Technical Bets: Low-Voltage Inference and Cluster-Scale Memory
Etched has identified two architectural primitives that no AI chip company had seriously explored. On the pre-fill side, thermal throttling from high voltage is the hidden governor on performance. On the decode side, the question is not chip-level memory bandwidth but cluster-level memory bandwidth.
"We were able to create a new mechanism of running at much lower voltages, a new type of power delivery that we call low-voltage inference. And we think all AI chips in the future are going to be low-voltage chips." — Gavin Uberti 00:12:07
"People often ask, how much memory bandwidth is on your chip? You should be asking how much memory bandwidth is on your full scale-up cluster... On Blackwell chips, it can be about 4,000 nanoseconds to go point-to-point... We can go ahead and cut this by more than a factor of 5x." — Rob Locken 00:13:04
Production Is the Product — Vertical Integration as Competitive Moat
Etched is the only AI chip startup simultaneously building its own chip and its own rack. The thesis is that vertical integration is not overhead — it is the source of both performance advantage and production velocity.
"The best vendor is no vendor. As much as possible, we want to vertically integrate the entire product. Both because we get more performance, but we can move way faster. So everything from the chips to the boards to the cold plates to the interconnects to even the production. We want to do all of it as in-house as possible." — Gavin Uberti 00:28:04
"There is another very famous AI chip company that took 10 months to go from getting their silicon back to having them running inference in a rack... We were able to do it in 40 days." — Gavin Uberti 00:40:09
Parallelization and Prefetching as an Organizational Operating System
Etched has institutionalized the practice of running every non-dependent workstream in parallel, spending money aggressively to collapse timelines. They shipped racks to customer data centers without their chip in them, ran 700 FPGAs as a full chip simulation, and built cold plates from thermal mock chips — all before silicon returned from fab.
"We took over 700 FPGAs and put the entire full reticle chip on an FPGA cluster and ran a dozen different models with our full inference stack on them before the chips came back. It means that we built a thermal chip to mock the thermal profile of our chip and built cold plates based on that before the chip came back." — Gavin Uberti 00:39:11
Bimodal Talent Strategy: Legends Paired With Naïve Geniuses
Etched's hiring philosophy is explicit and unusual. They seek two archetypes — "legends" who have done the zero-to-one on the hardest technical problems at scale, and extraordinarily driven young engineers who don't know what's supposed to be impossible. The key is not just having both but having them work together.
"We have a pretty bimodal talent philosophy. It starts with what we call the legends... We created this system we call project-based recruiting, where we map out all of the hardest technical problems across all industries that anyone has ever had to solve. We look at temporality, so who are the people who did the zero to one?" — Gavin Uberti 00:29:15
"One is that possible without the other? Because you need the extremely driven people that just keep asking why and don't know where the bodies are buried to like take tons of aggressive risks. And then you need the people who've seen scale and still have the startup scrappy mentality to help them along the way." — Gavin Uberti 00:32:39
The AI Chips Built by Hyperscalers and Labs Are Structurally Disadvantaged
Because chips like Google's TPU, Meta's MTIA, Microsoft's Maya, and OpenAI's Jalapeño are not existential to their parent companies, they will never get the same intensity of focus, risk-taking, or talent concentration as a pure-play chip company.
"It fundamentally is not existential for my company for this product to win. For Google with TPUs, their revenue comes from search. Google won't fail if TPUs fail. Meta won't fail if MTIA fails. Microsoft won't fail if Maya fails. And OpenAI won't fail if Jalapeño fails." — Gavin Uberti 00:55:44
"Look at the raw flop stats. You compare any of the chips built by the labs or by the hyperscalers. The flop density for FB8 times FB8 is lower than the Blackwell B300. And that makes sense because they don't have to go take the risk." — Rob Locken 00:56:17
Wall Clock Time Compression Is the Underrated Consequence of Better Inference
The most important second-order effect of faster inference isn't cheaper chatbots — it's compressing the wall clock time of long-horizon AI agent tasks, which will exponentially accelerate scientific discovery.
"A year long compute build would now take a month and a month long compute build will now take three days and that three day compute build will now take seven hours and so forth... I thought it was really cool months ago when Cursor published that they had a bunch of coding agents build an entire browser from scratch in a week. Totally nuts. And that will soon happen in under an hour." — Gavin Uberti 00:47:42
Future Models Will Be Designed Around the Economics of Compute, Not Memory
Rob Locken articulates a non-obvious architectural thesis: because math gets cheaper faster than memory does, next-generation models will be redesigned to use vastly more compute and vastly less data movement — the opposite of current architectures.
"For people storing data and loading memory is very cheap for neurons and doing math is relatively expensive and it is the exact opposite for chips. Generally data is very expensive and doing math is very cheap. And as time goes on you'll end up finding that math gets cheaper at a rate that is faster than memory gets cheaper due to this fundamental limit on any kind of DRAM device." — Rob Locken 00:17:35
2. Contrarian Perspectives
Inference, Not Training, Will Be the Biggest Market — and It's Already Obvious
At founding in 2023, the conventional wisdom was that training was the dominant AI compute market. Etched bet entirely on inference, at a time when most investors passed because "everything's going to be training."
"It seems like inference is going to be really important. And it feels like we are on a decade march for inference to become the biggest market in the world." — Gavin Uberti 00:23:49
The conviction was strengthened when Gavin realized that every software product he wanted to build would have COGS dominated by inference — a structural change to software economics that most hadn't internalized yet.
"All the products I want to build are going to cost tens of millions of dollars a year in inference. This is not going to be tenable. Like the cost structure of every software company and the COGS is not going to be like zero anymore for an incremental user." — Gavin Uberti 00:23:49
You Don't Need a General-Purpose Compiler — Specialized Kernels Beat Graph Compilers
The industry orthodoxy was to build graph compilers for broad model compatibility. Etched explicitly rejected this, betting that under 100 models would matter and that kernels-first programming — considered impractical without constant human tuning — would win as AI coding models matured.
"We envisioned a world where there was going to be under 100 models that actually mattered and they were all going to look very similar from the underlying mathematical perspective... not having to build a compiler has saved us a tremendous amount of time and has allowed us to actually get much more performance." — Gavin Uberti 00:52:07
"Funnily enough, when we started, a lot of people dismissed this idea and the only people that took us seriously were in high-frequency trading. They all hate compilers too. They all write their own kernels." — Gavin Uberti 00:52:37
The Biggest Constraint on AI Chip Performance Is Thermal Throttling, Not Raw Flops
Industry conversations focus on peak advertised flops. The real bottleneck is that high voltage causes chips to thermal throttle before reaching peak utilization — making headline flop numbers misleading. Adding more flops to existing architectures doesn't help.
"If I just add more flops to a GPU today or another AI chip, I'm not actually going to get more performance because it's just going to thermal throttle... Bitcoin miners run at under a quarter of the voltage of GPUs. So, this is obviously physically possible." — Gavin Uberti 00:11:39
GPU model flops utilization (MFU) in practice is 20–50% of advertised peak — meaning most of the silicon being paid for is not being used.
TSMC's True Moat Is Customer Service, Not Just Process Technology
Everyone credits TSMC's lead to process technology. Etched argues the real and underappreciated competitive advantage is the quality of their collaborative services — to the point that TSMC will run yield-improvement experiments on their own dime based on customer recommendations.
"TSMC customer service is way, way better than I have seen at any other company in any other industry... If you say, hey, you can improve your yield by making this change, you can go make them a recommendation and they will go run an experiment on their own dime in our case to see if they could actually get the higher yield." — Rob Locken 00:41:49
Traditional Semiconductor Experts Are the Worst Judges of What's Now Possible
Experienced chip investors and engineers missed all the major AI chip companies. Their constraints are real but based on a world of EDA tools, validation techniques, and FPGA capabilities that no longer exists.
"A lot of the traditional semiconductor funds missed the entire AI chip companies and like all the coding experts missed all the coding companies. I think it's very hard to realize the constraints have changed... You totally forget that EDA tools are way better and that FPGAs exist today in a way that they didn't exist before." — Gavin Uberti 00:49:48
3. Companies Identified
Etched AI inference chip company building a full rack-scale inference system including custom chip, boards, cold plates, interconnects, and production line. Founded 2023. First hardware company founded post-ChatGPT to tape out a working chip on first attempt. Over $1 billion in customer demand, $800 million raised.
"They already have more than a billion dollars of customer demand for their first product and have raised $800 million to build it." — Patrick O'Shaughnessy 00:02:27
TSMC Taiwan Semiconductor Manufacturing Company. World's leading chip foundry. Identified not just for process technology leadership but for exceptional collaborative service quality.
"TSMC customer service is way, way better than I have seen at any other company in any other industry... They will go run an experiment on their own dime in our case to see if they could actually get the higher yield. And when we found that we were right and the change worked, they moved over the rest of the line." — Rob Locken 00:41:49
Synopsys EDA (electronic design automation) tools company. Supported Etched pre-Series A with emulators on extremely favorable multi-year deferred payment terms.
"Synopsys actually went ahead and let us get some of their emulators on extremely favorable terms where we pay over many years. Basically a big loan. It takes a lot of belief from your partners to go do this." — Rob Locken 00:06:35
Cursor / AnySphere AI coding tool. One of the earliest companies incubated through Gavin's startup incubator Prod. Cited as an example of agents accelerating large-scale software projects.
"I thought it was really cool months ago when Cursor published that they had a bunch of coding agents build an entire browser from scratch in a week. Totally nuts. And that will soon happen in under an hour." — Gavin Uberti 00:47:42
NVIDIA Dominant AI chip company. Its HGX and DGX rack systems represent ~80% of revenue. Referenced repeatedly as the competitive benchmark and as proof that pure-play chip companies build the best chips.
"It is completely unsurprising that the best chip in the world is built by a company that only builds that chip. It's NVIDIA." — Gavin Uberti 00:55:44
Cypress Semiconductor Semiconductor company that sold for $9 billion. Mark Ross served as CTO there before becoming Etched's CTO.
"Mark was a very prestigious semiconductor expert. He used to be CTO at Cypress Semi that sold for $9 billion." — Rob Locken 00:06:36
Exnor Company where Rob Locken did his first kernels development work at age 17. Acquired by Apple for $200 million.
"My first job ever was at a company called Exnor, where I did kernels development. I was 17... Exnor got bought by Apple for $200 million." — Rob Locken 00:24:32
Octo Company where Rob Locken worked after Exnor, also doing kernels work. Acquired by NVIDIA for hundreds of millions of dollars.
"Did the same thing at Octo that got bought by NVIDIA for hundreds of millions of dollars." — Rob Locken 00:24:58
Colossus (xAI) Referenced as an example of large-scale inference cluster pricing dynamics — the only place to buy 20,000 Blackwells at once, commanding $12/hour pricing.
"Why is Colossus charging $12 an hour for Blackwells? It's because they're the only place you can buy 20,000 of them at once." — Gavin Uberti 00:42:58
OpenAI AI lab. Their chip project "Jalapeño" cited as an example of a non-existential internal chip effort.
"OpenAI won't fail if Jalapeño fails." — Gavin Uberti 00:55:44
Google (TPUs) Hyperscaler chip effort cited as structurally disadvantaged due to non-existential nature relative to core search revenue.
"For Google with TPUs, their revenue comes from search. Google won't fail if TPUs fail." — Gavin Uberti 00:55:44
Meta (MTIA) Meta's internal AI chip. Cited alongside other hyperscaler chip projects as non-existential.
"Meta won't fail if MTIA fails." — Gavin Uberti 00:55:44
Microsoft (Maya) Microsoft's internal AI chip project. Cited as non-existential.
"Microsoft won't fail if Maya fails." — Gavin Uberti 00:55:44
Ramp Finance automation company. Mentioned as a WorkOS customer and Vanta customer in sponsor segments.
Anthropic AI lab. Mentioned as a WorkOS customer.
Perplexity AI search company. Mentioned as a WorkOS customer.
Vercel Developer platform. Mentioned as a WorkOS customer.
Arm Chip architecture company. Former CEO attended the dinner where Rob connected with the TSMC VP.
Ridgeline Investment management software platform. Sponsor.
WorkOS Enterprise authentication API company. Sponsor.
4. People Identified
Gavin Uberti Co-founder and CEO of Etched. Harvard dropout. Survived Stage 4 bone cancer at 16. Previously ran startup incubator Prod which incubated Cursor. Architect of Etched's core chip design philosophy. The moment that catalyzed the company: using GPT-4V to identify what took doctors six months to find, in seconds.
"I go to show my parents and I got this notification being like, you're all out of image credits today. You need to get a pro plan. And I was like, holy crap, this is going to change everything. And we clearly don't have the infrastructure to serve it." — Gavin Uberti 00:22:51
Rob Locken Co-founder of Etched. Harvard dropout. First job at age 17 doing kernels development. Lived in Bangalore for 4.5 months to rescue a critical vendor relationship and prevent a year-long delay. World robotics champion in high school.
"Every morning, we'd go ahead and walk across the crazy busy Bangalore streets into the office. We'd be the first ones in... And then at 1 a.m., we'd go walk back through the now empty Bangalore streets." — Rob Locken 00:35:27
Mark Ross Former CTO of Cypress Semiconductor ($9B sale). Now full-time CTO of Etched. Initially skeptical, challenged the founders to build a simulation, then progressively got more involved as milestones were proven. Described as a "legend" hire.
"Mark was a very prestigious semiconductor expert. He used to be CTO at Cypress Semi that sold for $9 billion... He became an advisor, a half-time advisor, and eventually a full-time CTO, as he saw more and more of the development progress." — Rob Locken 00:06:36
Brian Full name not given. Founded the HGX and DGX team at NVIDIA — representing the majority of NVIDIA's revenue, tens of billions of dollars per quarter. Was planning to retire after one more NVIDIA generation before being convinced to join Etched. Described as their "legend" for rack-scale systems.
"Brian started the HGX and DGX team at NVIDIA, which was a majority of NVIDIA's revenue, tens of billions of dollars a quarter... There's so many times where we'd talk to Brian and he'd just point to us and be like, that's a billion-dollar lesson I learned, a billion-dollar lesson I learned." — Gavin Uberti 00:30:48
Sanford Rob Locken's high school robotics partner; together they held the world record score in FTC Robotics and were ranked third in the world by OPR for software. Finishing senior year of college when recruited to Etched. Built a functional cold plate prototype in one week when asked — something professional engineers said would take months. Described as a "chips on shoulders" hire.
"Sanford and Gavin in high school were world robotics champions... We say chips on shoulders, but chips in data centers... We called him up a couple of years ago... Can you build a cold plate this week?" — Gavin Uberti 00:31:39
Patrick O'Shaughnessy Host of Invest Like the Best, CEO of Positive Sum. Disclosed as a significant Etched investor across five or more rounds, with the first check being the largest first check he had written at the time.
"At the time, it was the largest by a lot first check that I had written." — Patrick O'Shaughnessy 00:08:27
5. Operating Insights
Prefetching the Schedule: Spend Money to Eliminate Sequential Dependencies
Etched's most powerful operational insight is treating time like a chip treats memory — prefetch everything you'll need the moment you know you'll need it, even at significant capital cost. The payoff was collapsing a typical 10-month silicon-to-running-inference timeline to 40 days.
"We know our chip is going to come back on a certain date. We want it to be that everything possible that could be done without the chip is done before the chip lands... We shipped racks to customer data centers without our chips in them with all the networking, all the CPUs, all the storage, all set up so we could bring all that data center software up before the chips came back." — Gavin Uberti 00:39:11
The principle generalizes: map every downstream dependency today and start all of them in parallel immediately, rather than sequentially.
Project-Based Recruiting: Map the Hardest Problems, Find the Zero-to-One Person
Rather than recruiting by role or résumé, Etched maps the actual hardest unsolved technical problems, then traces back who specifically — not what team or company — did the zero-to-one work. They then pursue that specific person persistently across many conversations.
"We map out all of the hardest technical problems across all industries that anyone has ever had to solve. We look at temporality, so who are the people who did the zero to one?... The amount of people who say yes after the first conversation is pretty low, but the amount of people who say yes after the 20th conversation is surprisingly high. You really got to keep at them." — Gavin Uberti 00:29:44
Speed of Decision Beats Correctness of Decision
In high-velocity hardware development, the cost of waiting for the perfect answer consistently exceeds the cost of occasionally being wrong. Etched delegates significant decision authority to on-the-ground team members with instructions to decide immediately.
"One of the worst items is when there's a factory or there's a vendor who is waiting for you to go ahead and make some call and has been just stalled... Send folks, give a big amount of responsibility to them and say, make a reasonable call. I would much, much rather be right most of the time and give an answer immediately than wait every time for the perfect response. Speed wins." — Rob Locken 00:37:36
When a Problem Seems Impossible, First Assume It Is Possible
Etched has institutionalized a specific mental move when facing seemingly unsolvable technical problems: explicitly assume the problem is solvable, then ask what the solution would require. This reframe unlocked solutions that caused other teams to stall entirely — including aligning two clock signals to within 50 picoseconds.
"Step one is, okay, let's assume the problem is solvable. How would it be solved?... People were, I think, blown away that this worked." — Rob Locken 00:58:51
"A lot of our story is like, assume it is possible. Assume it is possible to have a chip with way more flops on it... A lot of the time when we do experiments, we will do dozens of experiments and all of them will fail. But we only need one to work." — Gavin Uberti 01:00:03
Use Milestones Sequentially to Convert Skeptics Into Champions
The most valuable early advocates — like Mark Ross and the TSMC VP — were converted not through initial persuasion but through a repeated cycle of bold claims followed by delivered proof. Each milestone creates the permission to come back and ask for more.
"When you hear no from somebody who really is the best in the world, then that really means, hey, you should go ahead and come back when you have a few more milestones proven out. I think it's one of the most convincing things to see is, hey, we make bold claims. And when you go ahead and hit those again and again and again, that is really belief-inspiring." — Rob Locken 00:30:19
6. Overlooked Insights
High-Frequency Trading Firms Are a Hidden Talent Pipeline for Elite AI Hardware Engineers
This was mentioned briefly and no one dwelled on it, but it reveals something structurally significant: the culture of kernels-first, no-compiler, own-the-hardware thinking that Etched adopted is the same philosophy that elite HFT firms have practiced for decades. This creates an unexpected but deep talent overlap — dozens of HFT engineers have joined Etched specifically because of philosophical alignment.
"The only people that took us seriously were in high-frequency trading. They all hate compilers too. They all write their own kernels. And we've had dozens of people from high-frequency trading join the team because they saw this philosophy too." — Gavin Uberti 00:52:53
The implication for investors is significant: the best-performing AI hardware teams may not be found by looking at semiconductor veterans or ML researchers — they may be hiding in quantitative trading firms. And Etched has found this overlap at scale before competitors realized it existed.
The Future Mega-Cluster Is a $40–100 Billion Monolithic Token Factory — and Nobody Is Building for It Yet
Rob Locken made a very brief but stunning analogy that no one in the conversation unpacked: the end state of inference infrastructure is not a collection of racks but a single-purpose monolithic structure analogous to a semiconductor fab — a $40–100 billion building optimized entirely around producing tokens for one or a handful of models at planetary scale.
"You look at like a fab, for example. You have a $40 billion single monolithic building with only a handful of lines running through it. You could have the same kind of thing for some futuristic mega cluster. $40 billion, $100 billion as a giant mega-token factory serving one or a handful of models for a massive number of users to get that same economies of scale thing." — Rob Locken 00:49:20
This reframes the long-run inference infrastructure opportunity entirely: not distributed cloud compute sold by the hour, but purpose-built single-model factories with the economics and capital structure of semiconductor fabrication. The company or companies that design the hardware to anchor such structures — and the clusters that power them — will be defining infrastructure of the next era. Etched is explicitly positioning for this end state with its production-at-gigawatt-scale ambitions.