Interview with Li Xiang Part 2: CEO Large Model, MoE, Liang Wenfeng, VLA, Energy, Memory, Confronting Human Nature, Intimate Relationships, Human Wisdom
- 01VLA (Vision-Language-Action) as the Path to L4 Autonomous Driving
- 02DeepSeek's Impact: Accelerating Development While Validating Chinese AI Capabilities
- 03The Shift from Smart Terminals to AI-Native Terminals
1. Key Themes
VLA (Vision-Language-Action) as the Path to L4 Autonomous Driving
Li Xiang articulates a clear three-stage evolution in autonomous driving, positioning VLA as the culminating architecture that enables human-like driving capabilities:
"I think there are three stages. The first stage, starting in 2021, we used deep learning perception combined with rule-based algorithms... like insect-level intelligence. The second stage is end-to-end, which started research in 2023 and launched in 2024... like mammalian intelligence. But VLA is completely human-like operation - it can understand the physical world like humans, has its own reasoning capability, and can execute actions like human drivers." 00:39:33
The technical approach involves a sophisticated training pipeline: "First, we train a 32B VL foundation model in the cloud... Then we distill it to a 3.2B edge model with 8 experts in an MoE architecture... Then we do post-training to transform it into VLA by adding Action... Finally, we do reinforcement learning in two parts: RHF (with human feedback) for safety alignment, and pure RL using world model-generated data to drive better than humans." 00:47:06
Investment Insight: Ideal's VLA approach represents a differentiated technical path that could accelerate L3/L4 capabilities. The company expects initial L3 capabilities by Q3-Q4 2025, though regulatory approval may lag. 01:23:51
DeepSeek's Impact: Accelerating Development While Validating Chinese AI Capabilities
Li Xiang credits DeepSeek with fundamentally accelerating Ideal's AI roadmap and validating China's AI competitiveness:
"DeepSeek being open-sourced accelerated us by 9 months. We originally planned to have a suitable language model by the end of this year to train VLA, but DeepSeek allowed us to advance this significantly... This saved us several hundred million RMB in costs." 00:48:02
He sees DeepSeek as embodying human best practices: "DeepSeek demonstrated extremely well - research equals capability. They did extensive research work before development, which is why their training and inference efficiency is so high... Many companies want to skip straight to the 'tenth dumpling' without eating the first nine." 00:16:04
The strategic response was immediate: "During Spring Festival, we made the decision. I first discussed with Xi Yan (CTO), asking whether the model we'd build by September could be stronger than DeepSeek V3 + R1... We should stand on the shoulders of giants." 00:48:17
Investment Insight: DeepSeek's open-source approach is creating competitive advantages for companies with strong engineering capabilities and domain-specific data, particularly in physical AI applications where Chinese companies have caught up to or surpassed U.S. counterparts.
The Shift from Smart Terminals to AI-Native Terminals
Li Xiang redefines Ideal's identity from automotive company to "leading global AI terminal company," articulating a clear vision of what differentiates AI-era terminals:
"An AI-era terminal has four characteristics: First, 360-degree perception of the physical world. Second, cognitive decision-making capability. Third, Action capability - whether controlling terminal software or robots. Fourth, reflection and feedback capability." 01:33:59
He draws parallels to historical computing transitions: "In 1975, Apple emerged doing terminals with hardware-software integration. Microsoft did OS and software ecosystem. In 2007, iPhone emerged - Apple added services to hardware and software. Google did Android as the OS player. Today in the AI era, we see the same pattern - some will do terminals (us), others will do models and ecosystems (like OpenAI)." 01:34:28
The implications extend beyond automotive: "When we reach certain scale, like over 500 billion RMB revenue, we must consider - in users' work and life scenarios, can we launch the most competitive AI terminal products beyond cars?" 01:30:21
Investment Insight: Ideal is positioning for expansion beyond automotive into multiple AI terminal categories, following the Apple playbook of vertical integration. The company's automotive success provides the scale and capabilities to pursue adjacent AI hardware opportunities.
2. Contrarian Perspectives
AI as Production Tool vs. Information Tool - Most Current AI is Useless
Li Xiang makes a provocative argument that current AI delivers minimal productivity gains despite massive hype:
"A very important criteria for judging AGI is whether it's truly a production tool, whether it can actually replace and liberate humans from real work - the high-frequency 8 hours of daily work. If it's just information swirling in your brain, that's fundamentally different... Everyone says AI is great, but everyone's working hours are getting longer, and work outcomes haven't substantially improved." 00:28:54
He categorizes AI tools into three types: "Information tools, assistive tools, and production tools. Information tools - you generally won't pay for them. Assistive tools - you think they should come with the product. Production tools - the key characteristic is you're willing to pay for them... From our colleagues, only two qualify as decent production tools: Cursor (for engineers) and DeepSeek (for business strategy teams). They pay for these themselves, not with company money." 00:10:33
Why This Matters: This perspective suggests the current AI wave is creating busy-work rather than productivity gains. Companies that solve the Action gap (making AI actually do work vs. just suggest work) will capture disproportionate value.
Smartphones and Screens Have Made Humans Dumber, Not Smarter
Challenging the universal celebration of mobile technology, Li Xiang argues that screen addiction has decreased rather than increased human wisdom:
"Since smartphones emerged, how much of our time goes to doomscrolling versus truly learning? We've created information consumption patterns that decrease wisdom rather than increase it... What is wisdom? Wisdom is our relationship with all things. If you've never lived in a forest for days, you might think wood is just for chopsticks, paper, and tables - not understanding it as a different form of life." 02:26:44
He distinguishes intelligence from wisdom: "We're solving intelligence problems today, we haven't solved wisdom problems. Intelligence is computational capability; wisdom is relationship with all things... Many smart people have zero wisdom. Why can many smart people destroy companies? Because they have ability but no wisdom about relationships." 02:32:08
Why This Matters: This suggests the next wave of valuable AI applications won't be about information delivery but about freeing humans for higher-order thinking and relationships. Companies focused on replacing repetitive cognitive tasks (like Ideal's plan to replace appointment calling) rather than adding more information consumption will win.
Platform Companies Will Lose to Integrated Terminal Companies in Physical AI
Contradicting conventional wisdom that platforms always win, Li Xiang predicts vertical integration will dominate in physical AI:
"In PC era, Apple lost to Microsoft. In mobile internet era, Apple and Google tied. But in AI era, especially in physical world applications, I believe integrated terminal companies will defeat platform companies. Why? Because it involves life safety and property safety. This consistency - should it be solved by one entity or multiple entities? The results are completely different." 02:08:56
He explains the logic: "A platform company can't write traditional IT software to manage these AGI agents and robots. You can't manage vehicles running on roads without drivers using traditional software. The world model itself will become the true operating system for full autonomous driving." 01:11:42
Why This Matters: This suggests companies pursuing the "OpenAI path" of AI platforms may struggle in robotics and physical AI, while vertically integrated companies like Tesla and Ideal have structural advantages. The safety and liability requirements of physical AI favor end-to-end control.
Most Companies Doing VLA Don't Understand the Fundamentals
Li Xiang suggests that many companies jumping into VLA lack the foundational capabilities required, having skipped essential building blocks:
"Many people think if VLA is the tenth dumpling that fills you up, they can skip straight to it. But you absolutely cannot skip the first nine dumplings. If you couldn't do rule-based algorithms well, you don't know how to do end-to-end. If you haven't done end-to-end to an excellent level, you don't even know how to train VLA." 00:59:19
He emphasizes the prerequisite capabilities: "To do VLA well, first you need an excellent language model - DeepSeek helped us here. Second, you need complete pre-training, post-training, and reinforcement learning infrastructure. Third, you need world model and simulation systems. We've reduced 10,000km validation cost from 180,000 RMB to 4,000 RMB through pure compute." 01:05:27
Why This Matters: This suggests a wide capability gap between leaders and followers in autonomous driving that will widen, not narrow, with AI. Companies without strong research teams, training infrastructure, and years of data collection cannot simply "catch up" by using foundation models.
Young People Should Do AI Research, Not Industry Veterans
Contrary to the industry practice of hiring "big names," Li Xiang advocates for young researchers based on advice from DeepSeek's founder:
"When I met with Liang Wenfeng in September, one thing impressed me deeply - he believes young people should do research, because extensive experience is actually an obstacle to research. So we boldly use new graduates. We rarely hire industry 'big shots'... If you look at our autonomous driving team, maybe 60-70% are new graduates." 02:00:57
The underlying logic: "For doing research, young people without frameworks and preconceptions can more easily break through. Veterans have their own frameworks that become constraints. DeepSeek himself is a best practice - he's someone who succeeds by doing research first." 02:04:27
Why This Matters: This challenges the conventional wisdom that AI requires expensive veteran talent. Companies building strong research cultures with young talent may outperform those paying premium salaries for "big names" with outdated mental models.
3. Companies Identified
DeepSeek (深度求索)
Description: Chinese AI foundation model company that open-sourced V3 (MoE architecture) and R1 (reasoning model)
Key Quotes:
- "DeepSeek's open-source helped us save 9 months of development time and several hundred million RMB in costs. We originally planned to have a suitable language model by end of 2025, but DeepSeek accelerated this to Q1 2025." 00:48:02
- "What impressed me most about DeepSeek: First, it's an extremely disciplined person. Second, it's someone who researches and learns best practices globally. Their research work is very deep, which is why their training and inference efficiency are so high." 00:22:55
- "From September when I met them to January when they released R1, they closed what I thought was a 1-year gap with OpenAI in just one quarter. This is extremely impressive." 02:02:40
OpenAI
Description: Leading U.S. AI company, creator of GPT series and now Ship-of-Theseus viral product
Key Quotes:
- "OpenAI is a company with very strong comprehensive capabilities - strong research, strong R&D, strong products, and strong communication. Look at how they launched Ship-of-Theseus - using an emotionally resonant, donghua-style approach, they achieved another viral moment. One in ten of my WeChat contacts changed their profile pictures to that style." 02:03:17
- "They had over 400 million weekly active users [for Ship-of-Theseus]. This is extremely remarkable." 02:03:48
Tesla (特斯拉)
Description: Electric vehicle and AI company pursuing full self-driving
Key Quotes:
- "Our team's core members probably each received 20+ headhunting calls during 2024 and early this year. Because people doing AI know that long-term high-quality data, continuous funding, and whether the company truly believes (not just talks) - these determine everything." 01:51:51
- "If you look at our autonomous driving team size, we only have about 200 people doing end-to-end, similar to Tesla's scale. But our competitors doing rule-based algorithms have 2,000-5,000 people. Yet from product experience, our 200-person end-to-end team delivers better results." 02:00:35
Apple (苹果)
Description: Referenced as organizational and strategic model for terminal companies
Key Quotes:
- "Apple launched iPod in 2001. In 2000, they already had Mac computers, an OS, and software ecosystem. But Apple's market cap was only a few billion dollars. For that era, Apple's scale was already appropriate for doing these things. For us today at this scale, doing these things is reasonable." 01:48:05
- "If Apple only made Macs and didn't do iPod or iPhone, it might have become just another company that passed through history. If Microsoft didn't do Office or cloud services, it wouldn't be today's Microsoft." 01:45:37
Huawei (华为)
Description: Chinese telecom and technology company, referenced as organizational model
Key Quotes:
- "In our second phase, when we wanted to go from 10 billion to 100 billion revenue, we studied who to learn from. We felt our talent density wasn't sufficient, and we couldn't fully understand Apple. Huawei generously shared many of their capabilities in books - including IPD (Integrated Product Development), finance processes, and HR three-pillar structure. This helped us tremendously." 01:32:39
Waymo/Google (谷歌)
Description: Referenced regarding autonomous driving approaches and platform vs. terminal strategies
Key Quotes:
- "Google didn't just do the original operating system; they did Android as the mobile OS, built the entire service ecosystem. The OS was open-source but services belonged to them - Google Maps, Gmail, Google Play Store modeling App Store. In the entire mobile internet era, Apple and Google were basically evenly matched." 01:34:44
Cursor
Description: AI-powered coding tool that qualified as a "production tool"
Key Quotes:
- "From colleagues around me, only two truly qualified as decent production tools: Cursor and DeepSeek. Cursor is used by our engineering teams, DeepSeek by our business and strategy teams. They pay for these themselves, not with company money." 00:10:33
Xiaomi (小米)
Description: Referenced for expansion strategy into adjacent categories
Key Quotes:
- "Xiaomi also did IoT and automotive. When you reach that scale - like over 500 billion RMB - you must consider these things. Can we solve the most important scenarios in users' work and life, launching the most competitive AI terminal products?" 01:30:08
4. Operating Insights
The 4-Step Human Best Practice for Building Capabilities
Li Xiang learned from DeepSeek a universal framework that maps to R&D organizations:
"DeepSeek demonstrated human best practice extremely well: Step 1 is always research first. This is critical - whenever we want to change or improve capabilities, first step is research. Step 2 is R&D. Step 3 is expressing the capability (like showing how end-to-end works). Step 4 is turning capability into business value through actual deployment." 00:15:35
He applies this rigorously: "We often forget this best practice. When we see a problem, we directly do R&D without research, or we do R&D but don't properly express capabilities, or we don't take it into real business deployment. DeepSeek showed that if you do research well, R&D becomes extremely efficient." 00:15:59
Tactical Application: Before any major initiative, mandate a formal research phase with published findings, only then proceed to R&D. This prevents the common pattern of "rushing to build" without understanding the problem space.
The 3-7 Person Team as the Optimal Energy Unit
Li Xiang has discovered a specific organizational pattern for maintaining energy and avoiding bureaucracy:
"I believe the optimal support structure is 3-7 people. Less than 3 is too few - two people don't work well. More than 7 is too many. We intentionally design structures with 3-7 person combinations that support both intellectual firepower and emotional energy... Three people can form a more powerful brain through debate while forming a stronger heart to support each other." 01:54:48
He describes the mechanics: "In my family, past support between me and my wife was limited. But from last Spring Festival, our eldest daughter at 14 formed the third pillar of support. She can now genuinely communicate with us about her life plans, preferences, understanding of people and things. This three-person support dramatically increased family energy." 02:19:19
Tactical Application: When forming leadership teams, aim for 3-7 people who will debate decisions but support execution. Avoid 2-person partnerships (unstable) and 8+ committees (bureaucratic).
Forcing AI Teams to Serve Internal Business as Customer Validation
Rather than building AI capabilities in isolation, Li Xiang mandates internal business deployment as the success metric:
"I told our AI business and customer service leaders: If by end of this year, we still cannot hand appointment calling over to AI agents, your work is failing and AI is meaningless. All talk is empty. If we complete this, saving 20% of their time and massive energy drain, they can do more valuable work." 02:28:37
He extends this logic: "We shouldn't have a team that does AI for customers, AI for sales, AI for coding. Each professional team - customer service team, sales team, engineering team - should build their own professional AI using the AI OS platform. AI business team's job is making a good internal AI OS for everyone to build upon." 00:56:38
Tactical Application: Don't let AI teams build in ivory towers. Every AI initiative must have an internal business unit as first customer, with clear productivity metrics. If your own teams won't pay for it (figuratively), customers won't either.
Using "I Need You More Than You Need Me" to Build Intimate Relationships
Li Xiang shares a counterintuitive framework for building energy in relationships:
"A critical realization: In intimate relationships, it's that I need them first, then they need me. I need my children - they make me better. I need my wife - she makes me better. I need my leadership team - they make me better. Their importance to me actually exceeds my importance to them. When you express this need, it creates powerful energy." 02:21:53
This applies to work relationships: "When you pay attention to people's needs and strengths rather than waiting for problems, you're proactive. Initiative stays in your hands. You can actively build family organization and company organization. Without this, relationships either become ignoring each other, trying to control each other, or internal competition." 02:23:04
Tactical Application: In retention conversations or team building, explicitly articulate why you need each person and how they make you/the organization better. This creates reciprocal energy more powerful than compensation or titles alone.
5. Overlooked Insights
World Models Will Become the Operating System for L4 Autonomous Driving
Buried in the technical discussion, Li Xiang reveals a profound architectural insight that most investors miss:
"World models have three stages: Stage 1 is testing - running VLA in simulated traffic. Stage 2 is generating training data. Stage 3 - and this is critical - the world model will become the actual operating system for L4 robotaxis. You cannot write traditional IT software to manage driverless vehicles on roads. The world model IS the operations system." 01:11:37
He elaborates: "Like Waymo's system or what we're building - this isn't just simulation, it's the actual runtime environment for managing fleets. Traditional software can't handle the uncertainty and real-time decision-making required for vehicles without human drivers." 01:11:42
Why This Is Massive: This suggests world model capabilities are not just R&D tools but the actual infrastructure layer for autonomous operations. Companies building superior world models (like Ideal reducing validation costs 45x from 180k to 4k RMB per 10k km) have a compounding advantage that creates winner-take-most dynamics. This is infrastructure, not incremental improvement.
The Unspoken Talent War: 60-70% Fresh Graduates vs. Industry Veterans
Li Xiang casually mentions a radical talent strategy that contradicts conventional wisdom:
"If you look at our autonomous driving and model teams, about 60-70% are fresh graduates. We rarely hire industry 'big shots.' This was inspired by Liang Wenfeng's view that research should be done by young people because experience is often an obstacle." 02:00:57
The implications are profound: "When our autonomous driving core members each received 20+ headhunting calls in 2024, competitors couldn't understand how we achieved results with 200 people that they couldn't achieve with 2,000-5,000 people. It's because we built research culture with young people unencumbered by legacy frameworks." 01:51:51
Why This Is Massive: This talent arbitrage - hiring exceptional fresh graduates instead of expensive industry veterans - creates a 10-30x cost advantage while potentially generating better outcomes in breakthrough AI research. The insight suggests AI talent wars are focusing on the wrong profile. Companies that can build strong research cultures with young talent (like DeepSeek, Ideal) may sustainably outperform those paying premium salaries for "big name" hires carrying outdated mental models. This is especially true in China's education system which produces high-quality STEM graduates at scale.
The talent strategy also reveals why Ideal believes it can expand beyond automotive into other AI terminals - the research-oriented culture with young talent is transferable across domains, whereas automotive engineering expertise is not.
Note: This interview was conducted in April 2025 but released later. Li Xiang mentioned that during Chinese New Year (January 2025), Ideal made the strategic decision to adopt DeepSeek's open-source models after evaluating that their internal language model development, originally planned for September 2025, would likely not surpass DeepSeek V3 + R1. This decision saved 9 months and "several hundred million RMB" while accelerating their VLA timeline. 00:48:02