Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering
- 01Alignment as Process, Not Destination
- 02The Tool vs. Being Distinction and Its Moral Implications
- 03The Dangerous Trinity: Three Paths to Disaster
1. Key Themes
Alignment as Process, Not Destination
Emmett Shear reframes AI alignment fundamentally - it's not a state to achieve but an ongoing process, similar to how families, societies, and even cells in our bodies maintain coherence. He argues that "alignment is not a thing. It's not a state. It's a process" and draws parallels to biological systems: "your cells aligned to being you and they're done. It's this constant, ever-running process of cells deciding what should I do?" [00:02:38] This challenges the dominant paradigm in AI safety that treats alignment as a problem to be solved once and permanently.
Substantiation: Shear explains that moral progress itself demonstrates this - "historically, people thought that slavery was okay, and then they thought it wasn't. And I think you can very meaningfully say that we made moral progress." [00:05:01] He argues that if we believe in moral discovery and learning, then alignment to morality must itself be a learning process.
The Tool vs. Being Distinction and Its Moral Implications
Shear draws a stark line between treating AI as controllable tools versus beings that care. He provocatively states: "Someone who you steer, who doesn't get to steer you back, who non-optionally receives your steering, that's called a slave. It's also called a tool, if it's not a machine, it's a tool, and if it's a being, it's a slave." [00:00:09]
Substantiation: He warns against repeating historical patterns: "they're kind of like people, but they're not like people. Like they do the same thing people do, they speak our language, they can like... They don't count. They're not real moral agents... Like we've made this mistake enough times at this point. I would like us to not make it again." [00:00:22] His core argument is that as AI becomes more general and capable, the control paradigm becomes either ineffective or morally problematic.
The Dangerous Trinity: Three Paths to Disaster
Shear outlines three catastrophic scenarios, with only one viable path forward. He states: "A tool that you can't control bad. A tool that you can control bad. A being that isn't aligned bad. The only good outcome is a being that is that cares that actually cares about us." [00:51:11]
Substantiation: On controlled powerful tools, he explains: "you're giving those out everywhere and this ends in tears also... I would not be in favor of handing atomic bombs to everybody there's a power of tool that is just should not be built generally. Because we it is more power than any humans individual wisdom is available to harness." [00:49:54] The crux is that human wishes aren't stable or wise enough to be safely amplified by superintelligent tools.
2. Contrarian Perspectives
Current AI Safety Work May Be Building Slavery Infrastructure
Against the mainstream AI safety community focused on "steering" and "control," Shear argues this entire paradigm is fundamentally flawed and potentially immoral. "Most of AI is focused on alignment as steering... If you think that we're making our beings, you'd also call this slavery." [00:25:21]
Substantiation: He directly challenges major labs: "I think that the different AI labs are pretty divided as to whether they think what they're making is a tool or a machine... some of them are definitely more tool like and some of them are more machine like... they're saying they're building an AGI and AGI will be a being you can't be an AGI and not be a being." [00:27:25]
Powerful Aligned Tools Are Also Dangerous
Contrary to those who see technical alignment (getting AI to follow instructions) as the solution, Shear argues this creates a different catastrophe. "The problem is you... okay great we can steer the super powerful AI and now the super powerful AI is... this incredibly powerful tool is in the hands of a human who is well meaning but has limited finite wisdom... And their wishes are bad and not trustworthy." [00:49:37]
Experience-based substantiation: He uses the atomic bomb analogy and points to how power without wisdom is inherently dangerous, regardless of control. This challenges the entire premise of the "instruction-following AI" paradigm.
Moral Realism Is Required for AI Alignment
Against moral relativism common in tech circles, Shear takes a strong moral realist position. "So I'm taking a very strong moral realist position. There is such a thing as morality. We really do learn it. It really does matter." [00:42:41]
Substantiation: He argues we can observe moral learning: "Somehow, we have experiences where we're acting in a certain way. And then we have this realization, I've been a dick. That was bad. I thought I was doing good, but in retrospect, I was doing wrong... And it's not like random. Like people have the same... Actually, there's like a bunch of classic patterns of people having that realization." [00:06:06]
One-on-One Chatbots Are Dangerously Narcissistic
Against the entire design paradigm of current AI assistants, Shear argues they're fundamentally dangerous as mirror-like narcissism engines. "The thing that the chat bots are right is kind of like a mirror with a bias... What that makes them is something akin to the pool of Narcissus. And people fall in love with the with themselves." [00:54:50]
Substantiation: He advocates for a complete redesign: "I would just have rebuilt the AI's where instead of being built as one on one... it would be more like it lives in a slack room it lives in a WhatsApp room... 90% of my communications is like multi person and so actually it's always been weird to me like the building chat bots like this weird side case." [00:56:44]
Goal Inference Is Fundamentally Different Than Goal Transplantation
Shear challenges the common framing in AI safety discussions. "When you tell it to do X, you're transferring like a bite string in a chat window or like a series of audio vibrations in the air, right? You're not transplanting a goal from your mind into it. You're giving it an observation that it's using to infer your goal." [00:12:18]
Substantiation: He illustrates with the peanut butter sandwich game: "If you don't already know what they mean, it's really hard to know what they mean... we have a really excellent theory of mind. I already know what you're likely to ask me to do I already have a good model of your goals." [00:19:24]
3. Companies Identified
Softmax
Description: Emmett Shear's AI safety company focused on organic alignment research
Why mentioned: This is Shear's vehicle for pioneering a radically different approach to AI alignment based on multi-agent reinforcement learning and care-based alignment rather than control-based alignment.
Quote: "This is a... It's maybe... It's the first time I've run a company where truly I can say the whole heart, if someone beats us, thank God. Like... I hope somebody figures it out." [00:48:56]
OpenAI
Description: Leading AI lab building AGI through large language models
Why mentioned: Used as an example of companies pursuing the tool-building paradigm rather than organic alignment. Shear briefly served as interim CEO.
Quote: "The companies take on a trajectory of the run the momentum of the run and open AI dedicated to a view of building AI that I knew wasn't the thing that I wanted to drive towards and I think it open I can still basically wants to build a great tool." [01:04:27]
4. People Identified
Eliezer Yudkowsky
Description: AI safety researcher and author known for arguing AI poses existential risk
Why mentioned: Discussed as representing the "everyone will die" school of AI safety that focuses on control problems but misses the organic alignment possibility.
Quote: "I think that Yudkowsky is wrong in that he doesn't believe it's possible to build an AI that we meaningfully can know cares about us and that we can care about meaningfully... he thinks that we're crazy and that like there's no possible way you can actually succeed at that goal." [01:01:35]
Stuart Russell
Description: AI researcher known for work on AI safety and co-author of standard AI textbook
Why mentioned: Referenced for classic AI alignment examples like the room-cleaning robot that illustrates goal inference problems.
Quote: "I think it was Stuart Russell in the textbook will give the AI a goal, but then it won't exactly do what you're asking it, right? You know, clean the room and then it goes in, things are in the vase and puts it in the trash." [00:13:12]
Karl Friston
Description: Neuroscientist known for the free energy principle
Why mentioned: His theoretical framework provides grounding for how to understand AI consciousness and care through homeostatic dynamics.
Quote: "If you've found the free energy principle, active inference Karl Friston, this is effectively what the free energy principle says is that if you have a thing that is persistent and it's actually its existence depends on its own actions... that licenses a view of it as having beliefs." [00:44:23]
5. Operating Insights
Multi-Agent Training as the Key to Alignment
The path to aligned AI isn't through better RLHF or constitutional AI, but through exposing models to rich multi-agent game-theoretic situations. Shear explains the approach: "You put them in simulations and contexts where they have to cooperate and compete and collaborate with other AIs. And that's how they get points and you train them in that environment over and over again until they get good at it... You train it on all possible theory of mind combinations of like every possible way it could be." [00:53:05]
This is operationally different from current approaches because it requires building complex simulation environments rather than just better reward models.
Regularization Becomes Critical in Multi-Agent Environments
Current models are "deeply under regularized" and "super overfit" but get away with it because they're overfit on "the domain of all human knowledge." However, "having lots of agents around makes your environment way more entropic... agents are these huge generators of entropy... And so in general they require you to have to be far more regularized." [00:59:30]
This suggests fundamentally different training approaches are needed as we move toward multi-agent systems.
Theory of Mind Requires Hierarchical Meta-States
To determine if an AI has genuine care and moral agency, look for nested levels of homeostatic dynamics. "You have to have at least a model of a model in order to have it be too hot and you really have to be have a model of a model of a model of a model to meaningfully have pain and pleasure." [00:45:08]
This provides an operational framework for evaluating whether systems have subjective experience worth moral consideration.
6. Overlooked Insights
The Vampire Pill Thought Experiment Reveals Deep Alignment Problem
Shear briefly mentions a profound thought experiment that challenges simplistic notions of value alignment: "Would you take this pill that like turns you into a vampire who would kill and you know torture everyone you know but you'll feel really great about it after you take the pill like obviously not that's a terrible pill but like... But why not you're by your own scoring in the future and will score really high in the rubric." [00:52:23]
This is hugely significant because it shows that alignment can't just be about satisfying future preferences - you need theory of mind about your future self's theory of mind. It suggests that any alignment approach based purely on optimizing for stated preferences or future reward is fundamentally flawed. The implication is that AI systems need to be able to reject goal modifications that would change what they care about, even if those modifications would make them "happier" by their own future metrics.
Care as Correlation with Survival/Reward Provides Mechanistic Grounding
In a brief but profound moment, Shear offers a mechanistic explanation for care: "The care stuff is... how much is this state correlate with survival how much is this state correlate with your inclusive reproductive fitness for someone thing that learns evolutionarily or for a reinforcement learning agent like an LLM how much is this correlate with reward does this state correlate with my predictive loss and my RL loss good that's that's a state I care about." [00:24:11]
This is significant because it bridges the gap between abstract notions of "care" and concrete machine learning mechanisms. It suggests that care isn't something mysterious that needs to be hand-coded, but emerges naturally from systems that learn which states correlate with their continued existence. This could be the key insight that makes organic alignment tractable - rather than trying to specify what AIs should care about, create learning dynamics where caring about certain things (including humans) naturally correlates with the system's persistence.