139. 【Agent的综述】和苏煜聊Agent技术史、OpenClaw Moment、边界的消弭和社会的辐射
- 01The 70-Year Arc of Agent Technology Finally Converging
- 02Language as the Universal Scaffold
- 03The OpenCloud Moment Mirrors the ChatGPT Moment
Participants: 张小珺 (Zhang Xiaojun, host) and 苏煜 (Su Yu, Ohio State University CS Professor & Neocognition founder)
1. Key Themes
The 70-Year Arc of Agent Technology Finally Converging
Agent research is not new — it has been the central question of AI since the 1940s-1960s, but fragmented into subfields. Su Yu argues we are now witnessing a "convergence" back to the original unified goal.
"Agent this thing I think is definitely not a new topic. It runs through the entirety of AI. From when AI first started, people were discussing the problem of agents... The book 'Artificial Intelligence: A Modern Approach' by Stuart Russell and Peter Norvig — Stuart actually told me that although everyone thinks it's an AI book, it's essentially a book about agents. He strongly emphasizes that agent is not a new concept." 00:04:11 and 00:12:08
Language as the Universal Scaffold — Why This Generation of Agents is Fundamentally Different
The defining feature of current-generation agents is using language (broadly defined, including programming languages) as a scaffold for all capabilities — perception, reasoning, action, and memory compression.
"This generation of agents based on LLM, their biggest difference is they can use language as a scaffold to do all their things, including perception... including using language as reasoning — the so-called Chain of Thought — for different tasks I don't need to use the same compute. If the task is complex, I can generate more tokens. Each token generated is a forward pass, a certain amount of compute. This actually achieves adaptive computing, adaptive reasoning." 00:22:19
The OpenCloud Moment Mirrors the ChatGPT Moment
OpenCloud represents for agents what ChatGPT represented for LLMs: not a technical breakthrough, but a delivery/interaction paradigm shift that revealed latent capability to the public.
"OpenCloud moment... and the ChatGPT moment are very similar. ChatGPT's underlying technology had already been developing for several years before it came out. What OpenAI did with ChatGPT was fine-tune the model to be more like a chatbot and release it directly to the general public... OpenCloud is similar. Agents before OpenCloud had already had large development technically. Most people who work on agents and look at OpenCloud's codebase would have a feeling of 'nothing is new here.' But it is a profound change in interaction form." 00:49:32 and 00:51:28
2. Contrarian Perspectives
GUI Will NOT Be Replaced by CLI/API for Agents — And the Semantic Web Cautionary Tale
Against the popular narrative that agents will make graphical interfaces obsolete in favor of CLI/API, Su Yu argues GUIs carry accumulated knowledge, business logic, and constraints that agents should piggyback on — not recreate.
"If you can use GUI well, you can immediately reach all corners of human society, especially in these long-tail scenarios... Tim Berners-Lee after creating the internet quickly launched the semantic web experiment, wanting to give the whole internet explicit semantics... pushed for 20+ years, and adoption was still very low. Because this is related to human nature and how society operates. Society doesn't work that way — you can't just come out with a new standard and expect the whole world to rewrite everything." 00:40:10 and 00:41:38
The Real Existential Risk from AI is Not Sci-Fi Singularity — It's Uncontrolled Job Displacement
Su Yu dismisses the "AI destroys humanity" scenario but is genuinely alarmed by economic displacement outpacing new job creation.
"The so-called existential risks — AI hitting singularity, rapidly self-iterating, far surpassing human intelligence, then eliminating or replacing humans — I think in the foreseeable future I cannot see this possibility... But AI will indeed bring very large practical impacts to society. The biggest real concern is job displacement — if AI agents can replace knowledge workers at scale, and you can't produce enough new positions to absorb displaced workers, and there's no good redistribution mechanism, while most benefits accrue to a few top companies or capital holders — that will have extreme impact on society." 00:07:01 and 02:07:19
Specialized Intelligence > General Intelligence as the Real Commercial Opportunity
Everyone is chasing general AI, but Su Yu argues the world is composed of "millions of micro-worlds" each requiring specialization that large model companies are structurally incapable of delivering.
"The world is actually very complex. The world is not one world — it's composed of possibly millions of small worlds. Each small world, to truly generate value, requires specialization, requires becoming expert-level agents. This is something large model companies find very hard to do. Because they naturally want to make platform things, unified things, rather than things that need specialization. This conflicts with their organizational structure and business model — it's not a choice problem. Even if they chose to do it, they might not do it well." [00:01:00:55]
China's Application-Layer Speed is a Genuine Structural Advantage in the AI Era
Against the narrative that China lacks AI fundamentals, Su Yu argues China's speed at application adoption (often dismissed as "copycat") is actually a massive advantage precisely now, because foundation model intelligence has crossed a "good enough" threshold.
"Eric Schmidt, the former Google CEO, has specifically talked about this point — America at the application layer is generally much slower. I think in the AI era this is a very large advantage. Because the situation we now face is that foundation model intelligence has already exceeded a critical point. For many useful things, it's good enough. Many things that no one did before was because the friction was too high. But now AI capability has made it possible to greatly reduce this friction. So many things have crossed from 'not worth doing' to 'worth doing.'" 00:57:24
3. Companies Identified
Neocognition
- Agent research lab founded by Su Yu in Silicon Valley (2024)
- Why mentioned: Raised $40M seed round in ~6 months; focused on "specialized intelligence" and learning world models for expert agents; named after the neocortex
-
"Our positioning is an Agent Research Lab. All problems related to intelligent agents — if we think it's interesting or related to ultimately solving the agent problem — we'll be interested in doing it. Short to medium term, we're focused on the keyword 'Specialization' or 'Specialized Intelligence,' not general intelligence." 01:02:22
Lovable
- Vibe coding company; one of representative vibe coding companies
- Why mentioned: Originated from the "AI Engineer" open-source project (one of the early AutoGPT-era viral agent projects) that became a company
-
"Another representative one was called AI Engineer, claimed to be the first fully automated AI engineer. Its interesting point is that it eventually developed into a company called Lovable, which is now one of the representative companies in vibe coding." 00:34:29
Zhipu AI (智谱)
- Chinese AI company with GLM model series
- Why mentioned: One of the earliest Chinese companies to pursue computer use agents, with the AutoGLM series; had research collaboration with Su Yu's group on AgentBench
-
"Zhipu actually started on agent, especially this kind of computer use agent, quite early — the AutoGLM series. We have some connection because I've known Tang Jie (their lead researcher) for many years... we did a work together called AgentBench, which is one of the earliest agent benchmarks." 01:51:20
Project Prometheus (Jeff Bezos)
- New stealth company co-led by Jeff Bezos (returned as co-CEO)
- Why mentioned: Raised $6-7 billion, building computer use agent with focus on manufacturing, logistics, and physical infrastructure — a differentiated bet from software-focused agents
-
"Jeff Bezos recently opened a new company where he returned to the operator position as co-CEO, called Project Prometheus. They're relatively low-profile but have probably raised 6-7 billion dollars. They have a large computer use agent component, but what they ultimately want to build is more focused on manufacturing, logistics, infrastructure, factories." 01:49:51
4. People Identified
Percy Liang (Percy 珀西)
- Stanford professor
- Why mentioned: Semantic parsing pioneer who became influential in LLM/agent research; represents the lineage from semantic parsing to modern agents
-
"Many people who later made comparatively large contributions in LLM and agent research actually came from semantic parsing backgrounds — like Percy Liang at Stanford, Luke Zettlemoyer at University of Washington..." 00:20:25
Luke Zettlemoyer
- University of Washington professor; Meta AI researcher
- Why mentioned: Semantic parsing pioneer; led Roberta at Meta; led the Toolformer paper (Feb 2023) which Satya Nadella circulated company-wide at Microsoft as a landmark moment
-
"Toolformer — that was the first work using LM to do tool use. That came from Meta. Luke Zettlemoyer... was one of the main leads. Although agents hadn't yet become a hot concept at the time, this work had already generated very large impact. When I was still part-time at Microsoft, this paper was circulated company-wide by Microsoft CTO Satya." 00:33:01
Yao Shunyu (姚顺宇)
- AI researcher (Princeton/industry)
- Why mentioned: Created the ReAct paper (Oct 2022) — a foundational work extending Chain of Thought to agent settings with external environments; co-authored the Language Agent tutorial
-
"I think Shunyu made ReAct, which came out around October 2022. That actually extended CoT to a setting with an external environment, more like an agent setting." 00:30:06
Yu Tao (于涛)
- Hong Kong University researcher
- Why mentioned: Created OS-World benchmark (early 2024), one of the most representative desktop agent benchmarks; co-authored Language Agent tutorial
-
"Yu Tao's group made OS-World, which is very representative among these — that was probably March/April 2024, primarily desktop." 00:37:25
Dario Amodei
- Anthropic CEO
- Why mentioned: Su Yu credits Dario with correctly identifying coding as "the most fundamental fabric" of the digital world — a strategic insight that Anthropic has executed on brilliantly
-
"One has to admire Dario — Anthropic's CEO — he grasped this point very accurately. Coding is very fundamental. At least for the digital world, and I think not limited to the digital world, it is the most fundamental fabric, the most fundamental building layer. Everything can ultimately be expressed in code." 00:42:17
Jeff Hawkins
- Neuroscientist, author
- Why mentioned: His book "A Thousand Brains" provides the most compelling theory of how the neocortex learns world models — directly inspiring Neocognition's research direction
-
"There's a relatively new book by Jeff Hawkins called 'A Thousand Brains of Intelligence.' It's still a fairly new theory but I think it's one of the furthest-reaching in this area. He says each cortical column is learning a world model... this world model is not limited to the physical world — it includes all language, mathematical systems, various abstract concepts humans have created." 01:26:04
Chris Manning
- Stanford NLP professor
- Why mentioned: His recent podcast articulated the view that language (not vision) is what separates human civilization from other species — aligning with Su Yu's language agent thesis
-
"Chris Manning recently did a podcast discussing this problem. My views are very close to his. He has a saying: humans and chimpanzees have such different intelligence and civilization, but not because we have sharper visual perception than chimpanzees. Our vision is probably worse than many animals. But our language is unique. And this is the fundamental reason our civilization and intelligence are so different." 01:28:40
5. Operating Insights
The "Constructive Validity" Framework Applies to Both Research and Company Building
Su Yu uses the same framework he applies to benchmarks when evaluating startup opportunities — the thing being built must be highly correlated with real-world value generation.
"When I do benchmarks I like to emphasize a point called Constructive Validity or Ecological Validity — what your benchmark evaluates should be highly positively correlated with what you ultimately want the AI system to achieve, what generates actual value... I think building a company is the same. You need to choose a track where if it's solved — whether by you or others — it will bring a fundamental change to all of human society, with very high upside." 01:11:08
The "10 Weird Ideas" Test for Choosing Institutional Setting
Su Yu's framework for choosing between industry (Microsoft), academia, and startups was purely about idea throughput and exploration freedom — not compensation. This is a replicable decision framework.
"I'm a person with many interests — I might have ten things I want to do simultaneously. At Microsoft or other places I could maybe do one or two things. But I wanted to do ten things simultaneously. The school was the best place for weird ideas. Compared to money and income, I think that was much more important to me." 01:53:27
The "Forward-Deployed Engineer" Model Reveals Agent's Current Fundamental Weakness
The fact that OpenAI, Anthropic, and others are deploying armies of forward-deployed engineers to live at customer sites is a diagnostic signal — not a business model, but evidence that agents are still not reliable or accessible enough to self-deploy.
"Why are companies including OpenAI and Anthropic all adopting the so-called 'partner tier' model, recruiting so many forward-deployed engineers to be stationed at customer sites to help them build agents? This is actually a result of the problems I mentioned earlier." 01:46:34
6. Overlooked Insights
The AutoGPT-to-OpenCloud Parallel Suggests Current Excitement Will Also Fade — But Leave Infrastructure Behind
Su Yu briefly noted that AutoGPT in March 2023 had essentially identical GitHub star velocity and public excitement as OpenCloud today, yet "the things it could actually accomplish were very, very few." This is a critically underexplored point: the infrastructure and companies born from each "hype wave" (AutoGPT → Lovable; OpenCloud → ?) may matter more than the viral projects themselves.
"Auto GPT — I don't know how many people still remember how viral it was at the time. It very quickly surged to 100,000, now possibly 180,000 GitHub stars — not far from OpenCloud's current GitHub stars. But at that time, this was unheard of. The fastest-growing repo in GitHub history by far. But what it could actually accomplish was very, very few." 00:34:00
The non-obvious implication: the current wave of OpenCloud excitement will likely produce a class of "Lovable-equivalent" companies that should be closely watched now while still small.
Elon Musk's "Macrohard" Internal Org is His Biggest Strategic Bet — and Most Analysts Are Missing It
Su Yu briefly mentioned that Musk specifically created an internal organization called "Macrohard" (a deliberate inversion of "Microsoft") to build computer use agents that replace all software and all knowledge work — and that Musk was pursuing a Tesla FSD-style end-to-end small model + vision approach as a differentiated technical bet. This received almost no attention in the conversation but represents a multi-billion dollar directional bet.
"Old Musk previously was actually very passionate about the computer use agent thing — it's one of his biggest bets. He specifically formed an org called Macrohard — the counter-translation of Microsoft — specifically to do computer use agent and replace all software, do all knowledge work... His technical route, I think he tends toward using something similar to Tesla's route — because Tesla FSD has a proven path: a relatively smaller model, vision/video-based, doing direct end-to-end modeling." 01:48:53