OpenAI Sora 2 Team: How Generative Video Will Unlock Creativity and World Models
- 01Iterative Deployment and Society Co-Evolution
- 02Optimizing for Creation Over Consumption
- 03Video Models as World Simulators
1. Key Themes
Iterative Deployment and Society Co-Evolution
The Sora team emphasizes releasing powerful AI technology incrementally rather than dropping breakthrough capabilities all at once. Thomas Dimson explains: "For opening across the board, it's really important that we kind of like iteratively deploy technology in a way where we're not just like dropping bombshells on the world when there's like some big research breakthrough. We want to co-evolve society with the technology." This philosophy drove their decision to release Sora 2 at the "GPT-3.5 moment for video" - capable enough to be transformative but early enough for society to adapt. The long-term vision includes "copies of yourself running around in Sora and the ether, like doing tasks and like reporting back in the physical world because that's where we're headed long term."
Optimizing for Creation Over Consumption
The product strategy deliberately prioritizes creative activity over passive scrolling. Rohan Sahai, drawing from Instagram experience, notes: "The magic of this technology is that everybody is a creator. And so we want this feed to be optimized for you to create, to inspire you to create." The metrics validate this approach: "almost 100% of people who get past the invite code all that on the app end up creating on day one. When they come back, it's like 70% of the time they come back, they're creating." They've implemented features to break consumption patterns, including "in-feed units that's like, okay, you just kind of viewed a couple of videos in this domain, why don't you try creating something."
Video Models as World Simulators
The team describes Sora as building internal world models through scale, similar to how language models developed understanding. Bill Peebles explains: "What's really remarkable right is because these space time patches are just this like very simple and like highly reusable representation...You're just able to build like one neural network that can operate on this vast extremely diverse set of data and really build these like incredibly powerful representations that model like very generalizable properties of the world." When Sora 2 fails, it fails differently - respecting physics over user intent: "if let's say like the text input to SORL is a basketball star wants to like shoot a hoop...If he misses in the model, SORL will not just like magically guide the basketball to go into the hoop...it will actually defer to the laws of physics most of the time."
2. Contrarian Perspectives
Social AI is More Human Than Traditional Social Media
Rohan describes the unexpected finding: "It's actually strangely more social than a lot of social networks, even though it's all AI generated content." The cameo feature wasn't obvious initially - "I didn't think it would work at all" - but created intense engagement: "once we had that feature product market fit on team all everything we were generating was all of each other." This contradicts assumptions that AI-generated content would be dehumanizing; instead, it enabled unprecedented human connection through accessibility of creation.
The Instagram Algorithm Defense
Rohan provides a nuanced defense of algorithmic feeds that most people dismiss: "We did it for a reason...what was happening on Instagram over time was because it was chronologically ordered...if they're posting 20 times a day, your friends not...you'd have 20 natural geo posts, and then one picture that you actually were they cared about, that you never really scrolled to." He acknowledges the tradeoff: "When we started to open up to more unconnected content and ad pressure was very strong, there's also a natural company incentive to optimize for just blind consumption," but argues the original motivation was sound, supported by "pretty unambiguous" early tests.
Simulation May Already Be Good Enough for Some Scientific Discovery
Rather than waiting for perfect physics simulation, Thomas suggests emergent understanding from scale: "When you put enough compute and data into these systems like in order to actually solve this task of predicting the next token. You need to develop an internal representation of how the world functions right you need to like simulate things." He speculates the first discovery will be "something related to classical physics. A better theory of turbulence or something" - not quantum mechanics but phenomena where video provides rich observational data.
3. Companies Identified
OpenAI / Sora
Advanced AI video generation company building "world simulators." The team describes reaching 7 million generations per day with 30% of users posting to the feed. Key innovation is the diffusion transformer (DiT) architecture that "most of them are based on DITS diffusion transformers" according to Bill Peebles, who invented the approach. The platform is "almost 100% of people who get past the invite code...end up creating on day one."
Instagram (referenced)
Thomas Dimson worked there for seven years on early machine learning and recommender systems when "it was about 40 people." Rohan shares insights about the controversial move to algorithmic feeds, explaining it solved the problem where power users posting frequently buried content from casual creators. The learning: "oftentimes...the tools getting more people more creative is gonna be a huge unlock...but we do consistently see that things content is like also a social phenomenon."
Infinite Craft (game)
Rohan highlights this as exemplifying AI-enabled gaming: "there's a game called Infinite Craft, which is the world's simplest game...you just take elements, it's like fire or water earth, you have like four elements to start and you just drag them...combines into something new. And the thing it combines with is like a, it's LLM base." He sees potential for "untapped stuff in that space where...I like the idea of a process to discovery...These are all in the weights. You're just unlocking it with like a secret code, which is your prompt."
4. Operating Insights
Non-Obvious Product Decisions Create Defensibility
Rohan notes: "I was never actually that afraid of competitive pressure in that crazy product phase because I was like, we sort of had these all these non trivial decisions that are obvious in retrospect, but we're not obvious at the time that we're sort of building on top of each other." The cameo feature wasn't an obvious win initially, and choosing to make it social rather than single-player wasn't predetermined. This suggests building defensibility through accumulated non-obvious product choices rather than just technology moats.
Use Internal Adoption as Product Signal
The team knew they had something when internal behavior shifted dramatically: "feed is entirely cameo yes entirely just went from you know we didn't have that feature once we had that feature product market fit on team." When "a week later we were like this is still all we do," they recognized genuine product-market fit rather than novelty. This internal obsession preceded external validation.
Latest Feed as Product Feedback Mechanism
Thomas shares: "One of my favorite ways to get product feedback it is so diverse the type of stuff people are doing...the latest feed which is just like the fire host of everything." Rather than relying solely on metrics or surveys, watching the unfiltered stream of user creation provides real-time insight into emerging use cases and user needs, enabling rapid iteration.