133. 对谢赛宁的7小时马拉松访谈:世界模型、逃出硅谷、AMI Labs、两次拒绝Ilya、杨立昆、李飞飞和42
- 01Vision as the Foundation of Intelligence, Not a Subfield
- 02Representation Learning as the Only Permanent Research Thread
- 03The Non-Linearity of Research
- 04The Scaffolding Principle: Infrastructure Quality Sets the Research Ceiling
- 05World Models as the Endgame
Episode: 133. Marathon Interview with Xie Saining: World Models, Leaving Silicon Valley, AMI Labs, Rejecting Ilya Twice, LeCun, Li Fei-Fei, and 42 Podcast: 张小珺Jùn|Business Interview Series Guest: Xie Saining (谢赛宁), AI researcher and co-founder of NeoLab AMI with Yann LeCun
1. Key Themes
Vision as the Foundation of Intelligence, Not a Subfield
Xie Saining argues that vision is not merely a computer science application but the root of all intelligence — biological and artificial. He traces this from the Cambrian explosion, when the emergence of eyes triggered an evolutionary arms race that produced the diversity of life on Earth. This conviction has guided his entire 12+ year research arc.
"There's a theory that this explosion came from an arms race at the visual level. The eye is the only part of the brain exposed to the real world — other parts of the brain are hidden behind our skulls. So solving vision is not solving vision itself — it's solving intelligence itself." 00:27:55
Representation Learning as the Only Permanent Research Thread
Saining identifies representation learning — how neural networks map raw data to structured latent spaces — as the singular fundamental problem that never gets solved and never goes out of date. He contrasts this with "trend-chasing" research like Neural Architecture Search, which he says wasted the community two years.
"Neural Architecture Search is widely considered to have wasted the entire field roughly two years. Thousands of papers were published and there was no gain. This is why I'm willing to tell everyone I work on representation learning — it's a fundamental problem, an eternal problem, and one that has not yet been solved." 01:14:05
The Non-Linearity of Research — and Why It Matters for Investors
Saining describes a highly non-linear relationship between research quality and career/field impact, referencing MIT professor Bill Freeman's curve. Bad papers and mediocre papers have roughly equivalent impact (near zero), while great papers produce exponential returns. This is the reason great researchers optimize for their maximum, not their average.
"A researcher doesn't need to succeed a hundred times. You only need to succeed once in your life — maybe twice if you're lucky. The game of research is not like chess where your worst move defeats you. It's like being an inventor." 02:16:50
The Scaffolding Principle: Infrastructure Quality Sets the Research Ceiling
One of the most repeated operating insights in the episode is He Kaiming's principle: your research ceiling is determined by the quality of your baseline/infrastructure. Saining says this principle drove major architectural decisions at FAIR, including building a full TPU infrastructure from scratch.
"Kaiming told me: your research ceiling is determined by how good your baseline is. If your baseline is poor, you'll easily fool yourself — you can't find breakthrough results. He thought: how do we push the baseline as high as it can go? Then any improvement we make on that foundation is truly groundbreaking." 02:34:12
World Models as the Endgame — LLMs as a Transitional Tool
Saining is explicit that Large Language Models are not the foundation of general intelligence. He sees them as useful tools but argues they lack the physical/perceptual grounding needed for a true world model. His current startup AMI Labs with Yann LeCun is building toward this vision.
"LLMs will not die — but they will fade away. 'Old soldiers never die, they just fade away.' LLMs are a good tool, I use them every day, but they are not the foundation for building a universal intelligent system. They are not the bedrock of a world model." 02:24:29
2. Contrarian Perspectives
The Purpose of Research Is Understanding, Not Impact
Saining explicitly rejects "creating impact" as the goal of research, calling the word "too aggressive and too masculine." He quotes Hannah Arendt's framing that the goal should be understanding itself — and that sharing understanding creates a kind of kinship. This runs directly counter to how most researchers and institutions talk about their mission.
"Arendt said he doesn't care about impact. The purpose is understanding. If you can write down what you understand and share it, you allow more people to reach the same understanding. He would find in this a sense of finding family — people who understand what you understand, understand you." 01:32:13
The Best Research Papers Are the Ones That Fail Their Original Idea
Saining argues that if a paper's final result matches the original idea exactly — with no pivots, no surprises, no obstacles — it is by definition a boring paper. The best work emerges from non-linear exploration where the real idea is only discovered in the process.
"The worst research is where you define an idea at the start, then publish a paper where the idea is exactly that same idea — you didn't encounter any obstacles. This means your idea was a boring idea, and you published a boring paper." 02:10:13
Choosing Who You Work With Over Where You Work
Saining made this bet repeatedly — following his advisor from UCLA to UCSD (a lower-ranked school at the time), choosing FAIR over OpenAI in 2018, and going to NYU partly because LeCun was there. He consistently ignored institutional prestige signals in favor of people signals — and his read on upside was right every time.
"The only thing I wanted to focus on was: am I doing what I most want to do, and am I doing it with the people I most want to work with? I stripped away all the noise. That was the only thing I cared about." 00:39:48
Passing on Perplexity at Founding — and Framing It as the Right Decision
Saining recalls being among the very first people to see Aravind Srinivas's (Perplexity) demo at a Blue Bottle coffee shop in Palo Alto. He passed because he thought "this is just GPT with a shell." He doesn't regret it — and frames all such "misses" as consistent with a philosophy of following internal conviction rather than external opportunity signals.
"He pulled out a laptop and showed me a browser and said 'we're going to kill Google.' I thought: wow this is impressive — but internally I thought: isn't this just GPT wrapped in a shell? Why are you doing this? So he asked me to join and I said I still enjoy being at NYU." 03:12:00
LLMs Are Language-Brained — and Language Is an 8-Second Phenomenon in Evolutionary Time
Saining makes a provocative evolutionary argument: if you compress 538 million years of animal history into 24 hours, behavioral modernity and language only emerged in the last 8-10 seconds. The implication is that treating language as the primary medium of intelligence may be a profound anthropocentric bias.
"If we compress the Cambrian explosion to today into 24 hours — language, behavioral modernity, abstract thinking — all of that represents only about 8 to 10 seconds at the end of the day. The time we've had language is extremely brief." 03:17:26
3. Companies Identified
FAIR (Meta AI Research) Meta's fundamental AI research lab. Described as a peak-era research institution that operated like a university, with PI-led groups. Saining spent four years there and credits it as the crucible of his most important work (MoCo, MAE, DIT, ConvNeXt).
"FAIR was the cathedral at that time. It was absolutely where I wanted to be. There was no deliberation." 01:25:08
DeepMind Described as uniquely structured among AI labs: a hybrid of bottom-up exploration and top-down management, with PMs coordinating between research teams. Saining credits them with proving that ambitious multi-year research programs (AlphaFold) could be institutionally engineered.
"Demis told us: DeepMind will become a company that wins multiple Nobel Prizes — multiple. We all thought this was wildly ambitious. But now we've seen at least one step achieved. I find this truly admirable." 01:08:13
OpenAI / Sora Saining's intern Bill Peebles left to join OpenAI and built Sora using DIT architecture (which Saining co-created). He gives OpenAI credit for having the organizational structure to let a small team do something "nobody at FAIR would dare imagine."
"OpenAI is impressive — they could recognize Bill's talent and give their team enough freedom and resources to do something previously unthinkable — that was Sora." 03:10:13
Perplexity Saining was among the first people to see Aravind Srinivas's demo at a Blue Bottle café in Palo Alto and declined to join. Mentioned as one of the "missed" opportunities he's aware of.
"He showed me a browser and said 'we're going to kill Google.' I thought it was GPT with a shell... He asked me to join, I said I'd rather stay at NYU." 03:12:00
NeoLab AMI Saining's current startup co-founded with Yann LeCun. 25-person team. Just closed a large funding round. Mission is building world models that ground intelligence in perception, not language.
"They just completed their first major funding round. The team is currently 25 people." 00:00:42
4. People Identified
He Kaiming (何凯明 / Kaiming He) Research scientist, currently at MIT. Former FAIR researcher. Creator of ResNet, ResNeXt, MoCo, MAE. Described as the most focused researcher Saining has ever met — working in deep flow states, building full infrastructure from scratch, finishing papers a month before deadlines, reading philosophy, and giving out copies of the Diamond Sutra.
"Kaiming was the first to tell me we need to make models bigger and bigger — that was around 2018-2019. He had this vision very, very early." 01:51:19
"He has a kind of reality distortion field — things that seemed completely impossible somehow started to become possible when you were around him." 00:58:43
Yann LeCun (杨立昆) Chief AI Scientist at Meta, professor at NYU, Turing Award winner, co-founder of AMI Labs with Saining. Described as a visionary who saw the importance of self-supervised learning and interdisciplinary AI infrastructure a decade before others.
"Yang was visionary — he established this interdisciplinary data science center at NYU over ten years ago, independent of the CS and math departments. This foresight is remarkable." 00:40:48
Fei-Fei Li (李飞飞) Stanford professor, co-founder of World Labs. Described as the person who defined the problem of image classification rather than just building ImageNet — which Saining considers far more important than the dataset itself.
"Fei-Fei's greatest strength is that she is someone who can define problems. Before 2012, image classification was not a clearly defined problem. Defining the problem clearly was far more important than building the dataset — it gave deep learning a playground to stand on." 01:43:41
Tao Xin (图书恩 / Xin Tong / Tu Xin — referred to as 图老师) Saining's PhD advisor, who moved from UCLA to UCSD, bringing Saining along. Described as a researcher who coded foundational computer vision systems in 50,000 lines of C++ without GPUs or open-source libraries — and who made the prescient early pivot to deep learning.
"He wrote 50,000 lines of C++ for a single image segmentation task, including distributed training — all from scratch. Without researchers like him, we wouldn't have today. They blazed the trail." 00:42:15
Bill Peebles Saining's former intern at FAIR, now at OpenAI (key person behind Sora). Co-created DIT architecture with Saining. Described as a "perfect PhD student" — a hexagonal warrior with exceptional capability across all dimensions.
"Bill is someone I consider a near-perfect PhD student — exceptional in every dimension. In my view, he's a very sharp person." 03:02:40
Hou Xiaodi (侯晓迪) Co-founder and CEO of TuSimple (and now new ventures). Saining's senior at SJTU's ACM class. Wrote the famous SJTU Student Survival Handbook and published a solo CVPR paper as an undergrad — both considered legendary achievements at the time.
"He wrote the SJTU Student Survival Handbook — it talked about why people learn, what's wrong with Chinese education, and why research should be about genuine exploration of the unknown, not about publishing papers to fill quotas." 00:18:18
5. Operating Insights
Predict Before You Run Every Experiment
Kaiming He instilled a discipline at FAIR: before running any experiment, write down your predicted outcome. If you're right, your reasoning chain can be extended. If you're wrong, that is a signal — possibly more valuable than a correct prediction. Both outcomes generate gradient.
"Kaiming would tell us: for every experiment you run, you must predict the result. If you're right, it means your chain of reasoning can be extended further. If you're wrong, that's also a signal — you ask yourself: why was I wrong? Where did my thinking fail?" 02:40:19
Use a Structured Spreadsheet as Your Research Operating System
FAIR used Excel spreadsheets as the core tool for tracking experiments — not as bureaucracy but as a disciplined way to force decisions about what metrics matter, what comparisons are meaningful, and what signal each experiment provides. The discipline of designing the spreadsheet is itself the research methodology.
"The first lesson for interns at FAIR was: learn to use Excel. We would carefully build tracking templates. The key decisions were: which metrics do I track? Which experiments go in the table and which don't? Each row needs to have a relationship with other rows — that's what gives you gradient." 02:37:17
Know What the Largest Labs Are Doing — So You Know What Not to Do
Saining's stated reason for his Google part-time role while at NYU was intelligence gathering: understanding exactly what Google was working on so his lab could pursue orthogonal directions. This is a resource-constrained academic's asymmetric competitive strategy.
"I went to Google part-time because I wanted to see what they were doing — so I'd know what NOT to do at my academic lab. If you're already doing something, why would I compete with you? You have far more resources." 02:19:21
Fight for the Right Infrastructure Before Doing the Right Research
When Saining's students gave up on TPU after one week, he pushed them back for 3-4 weeks with a clear decision framework: if there are genuine technical barriers after serious effort, give up. If it's just friction and discomfort, push through. This discipline unlocked significant compute for his lab that allowed the Cambrian series.
"I told them: if after three to four weeks you find genuine technical barriers we can't overcome because we're not at Google — then we can give up. But if it's only temporary resistance after one week, we must try to step outside our comfort zone and solve the infrastructure problem." 03:22:10
6. Overlooked Insights
The DIT Architecture Was Dismissed by FAIR Internally — and Rejected by CVPR — Before Becoming the Backbone of Sora
This is a textbook case of institutional myopia. FAIR leadership did not prioritize DIT, wouldn't let Saining attribute it to FAIR when he left, and CVPR rejected it for "lack of novelty." The paper was resubmitted without any changes to another venue and accepted. It then became the architectural foundation for Sora.
"FAIR wouldn't let us use their name on the paper — they thought: it's okay, it's just a paper. And you're already leaving anyway. So it was submitted under NYU and Berkeley... Then Bill went to OpenAI and said: fine, nobody wants to use this — I'll build it myself." 03:08:00
This has implications for how investors should think about research commercialization: the most important architectural innovations often get dismissed at the institutional level first, creating a window for nimble players (like OpenAI) to capture the value. The signal to watch is not "who published first" but "who builds the infrastructure first."
Saining Was One of the First People to See the Perplexity Demo — and Passed
This is buried in a single passing comment, but it's significant. Saining was at the Blue Bottle café in Palo Alto when Aravind Srinivas showed him an early Perplexity prototype and asked him to join. He declined because he thought it was "just GPT with a shell." This illustrates how even extremely well-connected, technically sharp researchers can miss the product insight even when they see the underlying technology clearly.
"I may have been one of the first or second people to see his demo. He showed me a browser and said 'we're going to kill Google.' I thought: this is impressive — but isn't this just GPT with a shell? Why are you doing this?" 03:12:00
The non-obvious insight: the "it's just X with a shell" dismissal pattern recurs constantly in AI — and it is systematically wrong when the "shell" is actually a distribution and search advantage. Investors should flag this pattern as a cognitive trap in technical evaluators.