Manchi | 晚点聊 LateTalk Summary

Episode: 158: DeepSeek Before V4 — Talent Competition, Organizational Characteristics, and a Unique AGI Goal Podcast: 晚点聊 LateTalk Participant: Manchi (Reporter, LatePost)

1. Key Themes

The Talent Drain Is Real — But Overstated

DeepSeek has seen several key researchers depart around Chinese New Year 2025. Wang Bingxuan (core author of DeepSeek LLM) was recruited by Tencent's Yao Shunyu. Wei Haoran (core author of DeepSeek OCR) likely joined a major tech company. Guo Daya (core author of DeepSeek R1) also likely joined a major tech company. Ruan Chong (core contributor to Janus Pro, joined autonomous driving startup Yuanrong Qixing) departed earlier in 2025.

"In a research team of over 100 people, having three people leave is really not a lot. But the reason it has attracted so much external attention is that, by comparison, DeepSeek had virtually no full-time employees leave before this." [00:03:00]

The real signal: this is the first time departures have been to direct AI competitors, unlike earlier departures (one returned to academia, another took a long break).

The Equity Valuation Problem Is DeepSeek's Achilles Heel in Talent Competition

DeepSeek has no clear company valuation, making its equity compensation effectively opaque to employees. This is being compounded by the IPO wealth effect from peers.

"DeepSeek's employees, although they have signed option agreements, are quite confused about how much those options are actually worth. Starting from late last year, an external change made DeepSeek members feel even more urgency or confusion about this options issue — MiniMax and Zhipu, two of the first-generation large model companies, went public. After listing, their stock prices rose extremely well, already up five or six times. Both companies' market caps reached around 250 to 300 billion RMB." [00:06:59]

Manchi notes Liang Wenfeng is now actively working to establish a clearer company valuation to provide team members more certainty.

DeepSeek's True Mission Is Orthogonal to "Winning the Benchmark War"

Liang Wenfeng's vision for AGI is not simply about building the highest-performing model. He has two additional priorities: (1) building on domestic Chinese chip/software ecosystems, and (2) pursuing original, exploratory research that large well-resourced labs would never prioritize.

"Liang Wenfeng feels that outside the main thread of improving model efficiency and performance, it's necessary to do some work where the current returns are unclear. Because overseas companies with more compute, like Google and OpenAI, are internally exploring all kinds of directions. And if you just race on performance, other Chinese companies like Seed, MiniMax, Kimi, and Zhipu can also do that very well." [00:13:04]

This creates a genuine internal tension: young researchers want to work on the world's strongest models with the most GPU resources, and DeepSeek does not fully satisfy either condition anymore.

2. Contrarian Perspectives

No-Overtime Culture Is a Competitive Advantage, Not a Handicap

Against the grain of both Chinese tech culture and global AI lab culture (xAI researchers reportedly work 80-hour weeks), DeepSeek enforces a ~6-7pm end to the workday.

"Liang Wenfeng believes that a person can only produce high-quality output for six to eight hours a day. Especially in AI, if you're chronically fatigued and making poor, muddled judgments, you're actually wasting precious compute resources — which is counterproductive." [00:20:13]

Most would assume the opposite: that the most intense work culture produces the best AI research. DeepSeek's results (world-class models with ~1/10th the headcount of ByteDance's Seed) suggest this belief deserves serious challenge.

Benchmark Scores Are Now Misleading Signals for Model Quality

As AI moves into agentic use cases, benchmark rankings are increasingly detached from real-world utility.

"If we just look at benchmarks after a model is released, we can no longer really tell how strong the model is. Especially after entering the competition for agentic models, many people find that subjective feel and actual usage experience matter more. Two models might have similar benchmark scores but feel completely different to use." [00:14:30]

This has concrete implications for how investors and developers should evaluate model companies — pure benchmark chasing is no longer sufficient signal.

Liang Wenfeng Deliberately Avoids VC Relationships — Even After Going Viral

Most CEOs maintain ongoing VC relationships even when not fundraising. Liang Wenfeng actively rejected them, including after DeepSeek became globally famous.

"After January 2025 when DeepSeek went viral, Liang Wenfeng stopped meeting investors altogether — not even establishing new connections or getting to know new institutions. I know of some partners who tried through various means to get in touch with him, and he turned down most of those requests." [00:21:37]

This is a fundamentally different capital strategy than virtually any comparable company and reflects his single-minded prioritization of research over financing optionality.

Small Team + Reduced Hours = World-Class Output: The Math Doesn't Add Up (Until It Does)

"Before V3 and January 21, DeepSeek, with roughly one-tenth the headcount of ByteDance's Seed team and approximately one-third to one-half the per-capita working hours, reached the first tier of global large language models." [00:26:29]

This is a direct repudiation of the dominant belief that AI leadership requires massive teams and extreme work intensity.

3. Companies Identified

DeepSeek AI research lab and model developer, subsidiary of quantitative hedge fund High-Flyer (幻方). Known for DeepSeek V3, R1, Janus, and Prover series. Why mentioned: Central subject of the episode. Exceptional for achieving frontier model results with ~200 people, no-overtime culture, and a research philosophy focused on original exploration over benchmark competition.

"DeepSeek is genuinely the best place in China — or even globally — for someone who truly wants to do research." [00:25:18]

MiniMax Chinese AI startup, one of the first-generation large model companies. Why mentioned: Highlighted for aggressive model iteration cadence (4 updates in 2025: M2.1, 2.5, 2.7) and successful IPO with stock price up 5-6x, creating wealth effect pressure on DeepSeek's talent retention.

"MiniMax, from January of this year to now, has already updated three times: MiniMax M2.1, 2.5, and the latest 2.7." [00:15:56]

Yuanrong Qixing (元戎启行) Autonomous driving startup based in Beijing, offices in the same building as DeepSeek's Beijing office (Rongke). Why mentioned: Recruited DeepSeek veteran Ruan Chong, who was a core contributor since the High-Flyer era.

"Ruan Chong officially announced in January 2026 that he joined an autonomous driving startup, Yuanrong Qixing. By the way, Yuanrong and DeepSeek's Beijing office are in the same building." [00:02:18]

Zhipu AI (智谱) Chinese AI company, one of the first-generation large model startups. Why mentioned: Cited for successful IPO (market cap 250-300B RMB, up 5-6x), aggressive model iteration (5 updates in 2025), and releasing GLM5 Turbo specifically optimized for OpenClaw/agentic applications.

"Zhipu directly launched GLM5 Turbo, a model specifically optimized for OpenClaw." [00:16:23]

Zhijian Dynamics (智检动力) Embodied intelligence/robotics company. Why mentioned: Their humanoid robot model "Lustling" uses DeepSeek's open-source Janus Pro as its base model — a concrete example of DeepSeek's exploratory research having downstream real-world impact.

"Another company we recently interviewed, Zhijian Dynamics, an embodied intelligence company — the base model underlying their humanoid robot model Lustling is DeepSeek's open-source Janus Pro." [00:12:08]

Kimi (Moonshot AI / 月之暗面) Chinese AI startup known for long-context models. Why mentioned: Announced IPO plans post-Chinese New Year, began issuing options to interns to lock in talent (a direct competitive move against DeepSeek), and has updated models 3 times since early 2025.

"Kimi has started issuing options to interns to lock in talent early." [00:06:02]

4. People Identified

Liang Wenfeng (梁文峰) Founder of DeepSeek and High-Flyer Capital (幻方量化). Born 1985, Zhejiang University BS/MS. Why mentioned: Extraordinary profile — began quantitative trading as an undergraduate, founded High-Flyer at 30, began GPU accumulation in 2019 based on reading GPT-3 in 2020, has built a globally competitive AI lab with unconventional management principles. Refuses VC meetings. Has a singular, non-commercially-driven AGI vision.

"He is someone who is particularly resistant to noise... After DeepSeek went viral in early 2025, Liang Wenfeng has already demonstrated his indifference to flattery. Now he faces a different situation: how to distinguish noise from signal as external competition intensifies, and hold firm to what should be held, while changing what needs to change." [00:28:54]

Yao Shunyu (姚顺宇) AI research leader recruited by Tencent in the second half of 2024 to lead AI R&D. Why mentioned: His arrival at Tencent triggered aggressive talent recruitment, including poaching DeepSeek's Wang Bingxuan. Identified as one of the two most attractive destinations for AI researchers (alongside ByteDance Seed).

"Tencent recruited Yao Shunyu last year to lead AI R&D. A new leader arriving means building their own team. So Tencent has been aggressively and broadly reaching out to recruit excellent talent — and people joining Tencent at this stage are more likely to get core and important positions." [00:05:34]

Wang Bingxuan (王炳轩) Core author of DeepSeek LLM (first-generation large language model), participated in training all subsequent model generations. Why mentioned: Recruited by Tencent's Yao Shunyu before Chinese New Year 2025 — the highest-profile departure from DeepSeek to a direct competitor.

"Wang Bingxuan is the core author of DeepSeek LLM... He was recruited away by Tencent's Yao Shunyu before the end of 2025." [00:02:00]

Guo Daya (郭达雅) Core author of DeepSeek R1. Why mentioned: Recently officially resigned and likely joined a major tech company — R1 being DeepSeek's most globally celebrated reasoning model makes this departure significant.

"The third person is Guo Daya, who recently officially resigned. He is the core author of DeepSeek R1. He may also be joining a major tech company." [00:02:18]

Yang Zhi (杨志) Professor at Peking University, leads the team behind TileLang. Why mentioned: DeepSeek replaced CUDA and Triton with TileLang — a Chinese open-source project from his team — in their September 2025 V3 update. This is a concrete step in DeepSeek's strategy to build on domestic chip ecosystems.

"TileLang is a domestic open-source project initiated by Professor Yang Zhi's team at Peking University." [00:11:40]

5. Operating Insights

Protect Elite Researcher Output Through Time Constraints, Not Time Maximization

Liang Wenfeng's operating thesis: cognitive quality degrades past 6-8 hours, and in AI research, bad decisions waste compute — the most expensive resource. Therefore, limiting hours is a resource efficiency decision, not a cultural nicety.

"Especially in AI, if you're chronically fatigued and making poor, muddled judgments, you're actually wasting precious compute resources — which is counterproductive." [00:20:13]

Operators building research-intensive teams should consider whether they are optimizing for hours-on-site or quality decisions per dollar of compute.

Open Weekly Meetings Across Teams to Enable Organic Cross-Pollination

DeepSeek's most innovative projects emerge not from top-down roadmaps but from 3-5 people across different teams who independently converge on an idea.

"Most of DeepSeek's teams hold weekly meetings that are open to people from other groups — anyone can attend across teams to discuss. So to this day, DeepSeek can still achieve natural division of labor. Sometimes a new direction starts simply because three to five people all think an idea is worth pursuing — and those people may not even be from the same group." [00:24:31]

This is a low-cost, high-signal organizational practice that larger companies actively eliminate in favor of process and headcount specialization.

Founders Should Eliminate the "Internal Contractor" Relationship Between Infra and Research

At many large companies, infrastructure teams function as internal vendors to model teams, creating communication barriers. DeepSeek keeps infra, data, and base model architecture teams deeply interlocked.

"In some companies, infra functions like an internal contractor — the model team submits requirements. And in a large company, the infra team might be several hundred people, and as a member of that team, it becomes very hard to have close communication with the algorithm and model side. But at DeepSeek, many people have cross-learning opportunities." [00:24:02]

6. Overlooked Insights

DeepSeek Is Quietly Building for Post-CUDA Chinese Hardware — Years Before It Matters

Buried in the technical discussion is a pattern that most listeners would skip past: DeepSeek systematically replaced industry-standard GPU software stack components (CUDA → Triton → TileLang) and adopted FP8 data formats specifically designed for next-generation Chinese chips. This is not a performance optimization — it is a long-range bet on domestic semiconductor ecosystems.

"After updating V-Sender 1 in August last year, they adopted the UEM80 FP8 data compression format. At the time, DeepSeek itself replied in the comments that this data format is designed for the next generation of domestic chips. In the September 2025 update of DeepSeek V3, we can also see from the technical report that DeepSeek even replaced the underlying operator library from mainstream CUDA and Triton to TileLang — a domestic open-source project initiated by Professor Yang Zhi's team at Peking University." [00:11:40]

This is a strategic infrastructure moat-building exercise that no other frontier lab is doing. If Chinese domestic GPUs (e.g., from Cambricon, Biren, or Huawei Ascend) ever achieve competitive performance, DeepSeek will be the only frontier lab already optimized for them — a potentially decisive advantage that is completely invisible in any benchmark comparison today.

The Real Competition Is Already at the Intern Level — and Compensation Has Broken All Records

In passing, Manchi mentions intern compensation figures that signal how extreme the AI talent war has become at the earliest pipeline stage. Tencent reportedly offering 5,500 RMB/day to top interns (>100,000 RMB/month), surpassing even ByteDance Seed's previous record of 4,000 RMB/day.

"If 5,500 RMB per day is true, and this intern goes to work every day for a month, their monthly salary would exceed 100,000 RMB. Such high intern wages also indirectly show how fierce the competition for AI talent has become." [00:06:30]

The non-obvious implication: the real bottleneck in Chinese AI is not capital or compute — it is a tiny pipeline of elite researchers, most under 30, from a handful of universities. Any investor or operator who secures preferential access to this pipeline (through internship programs, university partnerships, or talent-first recruitment firms like the Dink AI recruiting firm mentioned) is acquiring a structural advantage that cannot be replicated through money alone.