138: From the moment you use your mobile phone to when it understands you better, OPPO's mobile phone AI practice | Conversation with Wan Yulong, head of Xiaobu
- 01Theme 1: Mobile AI Product Philosophy - Experience First, Not Technology First
- 02Theme 2: The "Memory" Revolution - Context is the New Competitive Advantage
- 03Theme 3: The Agent Ecosystem Paradox - Standards Lag Behind Innovation
1. Key Themes
Theme 1: Mobile AI Product Philosophy - Experience First, Not Technology First
OPPO's approach to mobile AI centers on solving real user problems through three core scenarios rather than chasing technology trends. Wan Yulong explains: "We identified three major scenarios where users can use AI to improve their experience on phones: productivity (learning and work), lifestyle (information and service acquisition), and imaging (photography and editing)" [00:32:25]. This led to positioning their AI as "productivity assistant, life butler, and imaging master" rather than building AI for AI's sake.
Supporting evidence: The company conducts "origin point journeys" where they sit with actual users and sales staff to understand pain points. Wan notes: "When we did user research, many young users, especially university students and new workplace entrants, told us they had bookkeeping habits but no good tools to implement them" [00:39:42]. This direct user feedback shaped features like AI-powered expense tracking, even though it wasn't initially on the team's radar.
Theme 2: The "Memory" Revolution - Context is the New Competitive Advantage
Memory and context understanding represent a fundamental shift in mobile AI capabilities. Wan Yulong emphasizes: "We believe memory includes chat history, content users actively tell AI to remember, and files/photos already stored on the device. These three together form the user's memory" [00:15:40]. This comprehensive approach to memory goes beyond what web-based AI offers.
Why it matters: Wan cites Sam Altman's reflection: "He said one thing that surprised him pleasantly last year was discovering that memory was much more important for ChatGPT than he imagined, because many users increasingly love using ChatGPT because of memory" [00:14:26]. OPPO's advantage is having a physical button for "one-tap capture" to lower the barrier for memory creation, plus access to device-level data that third-party apps cannot access.
Theme 3: The Agent Ecosystem Paradox - Standards Lag Behind Innovation
The mobile agent ecosystem remains fragmented with no clear standards, unlike the web where MCP is emerging as dominant. Wan explains: "Currently, each phone manufacturer is basically doing their own thing. Everyone still hopes to follow a standard, so when we interface with partners, they also write protocols according to methods like APIs" [01:11:45].
The challenge: Unlike the previous smartphone era where iOS and Android created clear platforms, the AI era hasn't yet produced a unified framework. Wan notes: "Google's Android intent application framework is published on developer pages hoping developers will adopt it, but not all application developers follow this protocol. So there isn't a relatively mainstream interaction protocol yet" [01:12:56]. This creates friction for service providers who must integrate with multiple platforms differently.
2. Contrarian Perspectives
1. Pre-training Models is NOT a Strategic Priority for Phone Makers
Counter to the 2023 "hundred-model war" mentality, OPPO deliberately chose NOT to pre-train their own large language model. Wan Yulong reveals: "We considered it... but by the first half of last year, we believed pre-training wasn't a key control point for manufacturers. We think the key control point is how to use models well" [00:51:13].
The reasoning: "You can see that now model players are converging - domestically maybe less than ten companies are still doing it. Every company's positioning determines what they should do... We don't need to do everything ourselves, like in the previous smartphone era we were already system integrators" [00:50:48]. Instead, they focus on fine-tuning open-source models, context engineering, and training small perception models where they have data advantages.
2. AI Penetration Rates are LOWER Than Expected - A Counterintuitive Reality
Despite massive AI hype, actual daily usage remains disappointingly low. Wan candidly admits: "Objectively speaking, this number is below my expectations... The monthly active to daily active ratio is not as good as standalone AI apps. Many users who bought phones forget their phone came with a smart assistant" [00:21:34].
Why this matters: This challenges the narrative that AI is already transforming mobile experiences. Wan attributes this to "the past ten years of experience with traditional voice assistants influencing users, making them habitually think voice assistants aren't that smart, so they habitually don't use them even though experience has dramatically improved with large models" [00:22:39]. The problem isn't technology - it's overcoming a decade of learned helplessness.
3. Voice Will Eventually Dominate Mobile Input - But NOT for Our Generation
While adults resist voice interaction in public, Wan argues this is generational, not fundamental. He observes: "You watch young children - they use voice for everything when watching TV, for searches. They are AI-native" [00:58:54].
The provocative parallel: "I sometimes compare two scenarios: when you're on the subway or street making a phone call, you don't feel awkward talking to a person. But if you're talking to an AI or giving it commands, you feel embarrassed. But actually, people around you can't tell the difference" [01:00:09]. The discomfort is psychological and cultural, not logical - and will fade as AI-native generations mature.
4. GUI-Based Mobile Agents are a Transitional Phase, Not the End State
Unlike many companies investing in GUI-understanding agents that "see" and "click" screens, Wan believes this approach is fundamentally wrong. He argues: "I think it's not quite intuitive. What's intuitive is that when you control something, as long as it can directly call an interface or command, it can be implemented. There's no need to understand many elements on that screen" [00:27:47].
His vision: "AI agents should atomize capabilities, then provide them to models through some method so models can understand and invoke them. Models naturally have function calling or tool use capabilities - those individual capabilities should be provided as functions or tools for models to invoke, not having models reverse-understand interfaces then click on them" [00:28:33]. This requires a new OS architecture, not just AI layered on existing systems.
5. The Next Mobile OS Will Come from OUTSIDE Current Players
When asked about the future of mobile operating systems, Wan expresses hope for disruption: "I hope there will be something new... I think it needs some new thinking to do this. Theoretically it should be something that grows alongside large model development" [01:28:36].
Why this matters: Despite being from a major phone manufacturer, he acknowledges current approaches are insufficient: "Almost all AI from every company is still built on traditional OS foundations... I think AI-era OS should be quite different, and I understand it should be different from the bottom up to grow a more different product ecosystem" [01:27:11]. This suggests even insiders see the current Android/iOS duopoly as inadequate for the AI era.
3. Companies Identified
OPPO - Mobile AI Innovation Leader
Description: Major Android phone manufacturer with 170 million monthly active users of their "Xiaobu" AI assistant, focusing on on-device AI capabilities and cross-device ecosystems.
Excellence indicators:
- "We were among the first batch to adopt DeepSeek R1, and quickly rebuilt our dialogue experience and multimodal Q&A experience through R1" [00:47:01]
- Wan Yulong on their differentiation: "We should be the first to combine hardware by adding a dedicated hardware button for memory - a physical button to trigger this memory action, helping users more conveniently use the memory function" [00:13:07]
- Their "one-tap capture" feature processes multiple modalities: "In this version it will bring more modalities for capture content, like video-level capture where users watching a long video can capture the entire video's information through one tap" [00:07:25]
DeepSeek - Breakthrough AI Model Provider
Description: Chinese AI company whose R1 model has become a preferred choice for mobile implementations due to its efficiency and performance balance.
Excellence indicators: Wan notes their rapid adoption: "DeepSeek appeared, and we were among the earliest batch to use DeepSeek R1" [00:47:01]. The speed of integration (within weeks) suggests DeepSeek's models are particularly well-suited for mobile deployment scenarios where efficiency matters alongside capability.
OpenAI - Setting New Product Paradigms
Description: AI research company whose product evolution from ChatGPT to "super agents" is influencing industry thinking about AI systems architecture.
Excellence indicators: Wan cites Sam Altman's strategic insight: "In a podcast I saw recently, Sam Altman said he thinks one thing he did last year that pleasantly surprised him was discovering that memory was much more important for ChatGPT than he expected" [00:14:21]. Additionally, their shift from single models to systems: "OpenAI hopes to no longer launch individual models but launch a system, and he even thinks this super system should be composed of many models working together" [01:29:27]
Alipay (支付宝) - Agent Ecosystem Pioneer
Description: Chinese super-app for payments and lifestyle services, pioneering agent-based service integration with phone manufacturers.
Excellence indicators: OPPO has partnered with Alipay for agent-based service delivery where users can access Alipay's services through Xiaobu without opening the app: "We recently cooperated with Alipay on lifestyle-related agents. They package services within their ecosystem through agent methods, then we understand user intent and invoke corresponding services" [01:07:41]. This represents the early formation of the new service delivery paradigm.
4. Operating Insights
Insight 1: Rapid Iteration Cycles Require Organizational Restructuring
AI products demand fundamentally different development rhythms than traditional software. Wan explains: "In early 2021, we basically released a version every month or so. By early this year, at our fastest we had to release a version every week because things change so rapidly" [00:47:01].
The adaptation: Create "strategy product managers" distinct from traditional user product managers: "Strategy product managers continuously look at or identify how users actually use it, by analyzing online logs, looking at user feedback... looking at which strategies can make users more willing to forward content, which strategies make users more willing to have long-term multi-turn conversations with you" [00:45:45]. This role is data-driven and observational rather than design-driven.
Insight 2: User Research Must Go Direct, Not Through Proxies
OPPO institutionalized direct user contact through "origin point journeys." Wan describes: "Since early last year, our company organized activities called 'going to the front lines, deep into the front lines' - our origin point journeys. We gather many original users and our sales staff to sit together and discuss how users actually use phones, what pain points they encounter, what problems they face using AI" [00:39:04].
The revelation: "When I first heard the bookkeeping requirement, since I'm not a bookkeeping user, I didn't quite understand it. But this is what origin point journeys brought me - a kind of impact or changed perspective. When you go to physically contact some real users, especially after visiting retail stores, you discover some common requirements that you as a user may not need, but they really need" [00:40:18].
Insight 3: Hardware Integration as Competitive Moat
OPPO's decision to add a physical button specifically for AI memory capture represents a key product insight. Wan explains: "This is something we think is different in tactical strategy choices... We believe this action itself can lower the user threshold for using memory and form a pattern of随时随地 (anytime, anywhere) memory and recall" [00:13:21].
Why it works: Physical affordances beat software UI for habit formation. Users don't need to remember menu locations or gestures - the button creates a direct, unambiguous trigger for an action that should become habitual.
Insight 4: Accept That AI Products Start as "Black Boxes" Then Clarify Through Usage
Traditional product development flows from requirements to design to implementation. AI products invert this. Wan notes: "AI products are different from software products. Software products are white-box - designed to look a certain way, users know what they can do... But AI products are hard to be white-box initially. Many times you need to observe how users use them" [00:26:45].
The implication: "When large models first appeared, there was a classic case where users would ask 'what kind of story is it when a chicken wears red pants and crosses the Yangtze River?' The large model would think 'since you gave me this proposition, I can answer you.' But behind it, users might want you to create a novel, might want you to answer a fact, or might just be joking with you. You can only discover users' real expected demands through continuous dialogue and interaction" [00:56:57].
Insight 5: Talent Strategy - Hire for Mission Alignment, Not Just Skills
When recruiting AI talent from big tech companies, OPPO focuses on philosophical fit over pure technical capability. Wan explains his approach: "First, it must be mission alignment (志同道合). AI is a big field... so when we communicate with potential colleagues, we generally ask several questions. First question: what are you truly passionate about? Are you more passionate about pure technical model training, or do you want to make a product that influences hundreds of millions of users upon release?" [00:58:07].
The cultural element: "OPPO is a company that emphasizes '本分' (integrity/doing right) in organizational culture... returning to essence to think about why we're doing something, what exactly we're doing, and how to satisfy users" [00:58:22]. This values-based filtering ensures new hires will thrive in an environment that prioritizes user outcomes over technical novelty.
5. Overlooked Insights
Insight 1: The "Embarrassment Threshold" is a Massive Adoption Barrier That Will Disappear
While discussing voice interaction challenges, Wan made a profound observation that most missed: "I sometimes compare two scenarios: First scenario is you're on the subway or street making a phone call to someone - you don't feel awkward, you naturally express yourself because you feel you're talking to a person. But if at this time you're chatting with an AI or giving it commands, you feel embarrassed. But actually people around you can't hear the difference" [01:00:09].
Why this is huge: The entire voice AI industry is wrestling with adoption problems, attributing them to accuracy, latency, or capability issues. Wan identifies that the real barrier is purely psychological - a learned social discomfort that has no rational basis. As he notes: "I think this is a usage habit issue, not having formed that mindset that I can actually casually talk to an AI in public places" [01:02:14].
The generational shift: Children are already past this - "You watch young children, they use voice for TV, for searches, for everything. They are AI-native" [00:58:54]. This means the "embarrassment threshold" will simply age out of the population within 10-15 years, dramatically expanding voice AI adoption without requiring any technological breakthrough. Companies that prepare for this inevitable shift now (through voice-first design) will be better positioned than those waiting for it to happen.
Insight 2: The Real Insight - AI Will Recreate the "Pre-App Store" Moment
Buried in the discussion about future OS development was a profound parallel. Wan noted: "When the first iPhone came out, it had two key points: first was interaction method - revolutionary human-computer interaction through multitouch... second was the application ecosystem - many services users couldn't complete on phones before could now be completed through applications" [01:21:46].
Then he revealed: "In this AI wave, both changes are happening. First is interaction method... second is the application ecosystem is also gradually changing" [01:22:13]. But here's what he immediately followed with: "It's just that while these three points are changing, user usage habits, lower usage barriers, and better service quality - this change may still need a certain period. And the application ecosystem change hasn't reached that explosive point like when the App Store appeared" [01:23:27].
Why this is the real story: The smartphone industry is actually in a "2006 moment" - the year BEFORE the iPhone launched. The technology pieces are assembling (models getting better, on-device processing improving, context windows expanding), but the explosive "app moment" hasn't arrived yet. Wan explicitly says: "I think it definitely requires some explosive products to appear, which can trigger more developers' willingness to invest" [01:24:07].
The investment implication: Just as the most valuable companies in mobile weren't the handset makers but the app-layer companies (Facebook, Uber, Instagram, etc.), the most valuable AI companies may not be the model makers or even the device makers, but whoever figures out the "AI-native app" format that triggers the ecosystem explosion. And based on Wan's timeline expectations ("next year I hope to see..."), this explosive moment could be 12-24 months away, not 5-10 years.
Key Takeaway: The mobile AI revolution is further along technically than adoption metrics suggest, with the primary barriers being psychological (embarrassment about voice) and ecosystem-related (lack of killer agent apps) rather than technological. The company or platform that solves the "agent ecosystem cold-start problem" will capture enormous value, and phone manufacturers like OPPO have structural advantages in context/memory that pure AI companies cannot easily replicate.