Matt Tancik

1. Key Themes

Creativity Is Direction, Not Execution

The core thesis of the episode is that creativity has never resided in tool mastery — it lives in the act of directing. As AI tools commoditize execution, the human role shifts entirely to vision and direction.

"It's not about mastering those tools. It's about directing an agent who can use those tools to achieve your creativity." — Zach Xia [00:00:00]

"The creativity is building a story. The tools alone aren't a story. Someone has to direct them." — Matt Tancik [00:03:53]

AI Is Expanding the Creative Surface Area, Not Replacing It

Both guests argue AI doesn't kill existing creative practice — it adds entirely new mediums and unlocks latent demand from people who previously lacked access to professional tools or skills.

"When you start doing things that are purely in the AI world, you can start thinking about what would an image look like maybe halfway across the world or in specific studio lighting situations that you would never actually be able to do." — Matt Tancik [00:39:17]

"A lot of people, they want a good recording of their moments, their lives. They may not necessarily have the skills to take their photo... Those are the latent demand that we try to address." — Zach Xia [00:37:41]

Personalization Is the Central Unsolved Problem

Both speakers repeatedly return to personalization as the hardest and most important frontier — not just stylistic preference but dynamic, implicit, evolving identity that cannot be verbalized by users.

"It's almost like there's no good way to define it other than you can learn that from data... Even if you ask the user, what's their own style? I mean, they can put some keywords there, but it's not complete." — Zach Xia [00:45:36]

"The agent needs to have really good memory. So if I tell you once, shouldn't need to tell you again in the future." — Zach Xia [00:15:27]

The Research-to-Product Gap Is a Strategic Moat

Both founders navigated a transition from deep research (NeRF, Adobe Lightroom work) to product building, and identify a real gap: researchers optimize for metrics and push technology forward, while users just want their specific problem solved today.

"As a researcher, your goal is always to push the technology... That doesn't necessarily align with what, say, a customer or end user would actually be interested in." — Matt Tancik [00:06:45]

"Technology needs to be ahead of users a little bit. So that maybe you can reimagine their workflow. But at the same time, you want to make it balanced." — Zach Xia [00:07:55]

Model + App Co-Design Creates Compounding Advantages

Building the model and the application together — rather than building apps on top of third-party models — creates a feedback loop that standalone app builders or pure model companies cannot replicate.

"Being able to have the users interact in your app and work through their workflow in your app, you learn which things a user actually cares about. You learn which steps do they have to redo multiple times." — Matt Tancik [00:28:31]

"Your product can also help educate the user what's the best way to sort of use their model... App and model co-design has that advantage too." — Zach Xia [00:28:51]

Identity Preservation Is a Distinct, Unsolved Technical Category

Foundation models that appear capable at image generation fundamentally break down when tasked with precise identity consistency — whether for human faces or branded products. This is a non-obvious failure mode that only surfaces at the individual level.

"Everybody say, oh, the general foundation model is already very good at identity until you try to use it to generate image of yourself. Same thing with product photography. Oh, it's pretty good at generating this specific product until you are the product owner... It just stops working." — Zach Xia [00:23:41]

Controllability Is the #1 Professional Complaint

The single biggest barrier to professional adoption of AI creative tools is the inability to control outputs precisely. Text prompts alone are insufficient, and the industry has not yet converged on the right interaction paradigm.

"You don't want to operate with text. There has to be something more than just text... We've been doing a lot of work in the video-to-video space because we think that's a very good way to start incorporating pretty precise controls, especially in the time dimension." — Matt Tancik [00:26:14]

The AI Quality Gap Will Widen Between Great and Average Artists

AI raises the floor for everyone but simultaneously amplifies the ceiling for skilled artists — making the gap between top and average creators larger, not smaller.

"The gap between the very best artists and average artists is going to be even bigger... There's so much easier for people to express their creativity when they have a very good understanding of what they want to get, how they want to get there." — Zach Xia [00:44:15]

"I can guarantee the same tools that made those bad generations — they've also made generations that you would think are amazing." — Matt Tancik [00:44:37]

2. Contrarian Perspectives

AI Output Defaults to Slop — Humans Must Fight for Quality

Most people assume better AI models produce better outputs automatically. The contrarian view here is that default outputs are mediocre by design, and quality is entirely a function of human direction effort.

"They assume AI to be slop. Like anything AI models have would be slop. It's up to the humans to create something out of it. If you just say, I want a picture of a cup of coffee, the first generation is just mediocre. You can't have it in any stock image website. It's nothing unique or personal." — Yoko Li (quoting a third party) [00:44:50]

Don't Build What Users Ask For — Especially Power Users

The conventional product wisdom is to listen to your users. The speakers argue that implementing what users request can be a trap — especially when those users are anchored to old paradigms.

"A seasoned Photoshop user may ask you to reinvent a lasso tool. The question is, do you want to do that? Is that the mental model you want your model to have? Maybe the next abstraction is there's no lasso tool." — Yoko Li [00:08:59]

Photography Is Not Dying — Process Value Is Underrated

The conventional narrative is that AI will make traditional photography obsolete. The counter-argument is that many photographers value the process itself, not just the output, and that won't change.

"Even digital cameras have become so good nowadays, a lot of people are still taking films. Film is such an interesting process... You have to capture that moment, otherwise it's gone. And you don't see the result right away." — Zach Xia [00:35:37]

Models Should Ask Users Questions, Not Just Accept Prompts

The dominant UX paradigm is user-to-model one-way prompting. The contrarian view is that models should proactively ask clarifying questions before acting — more like a professional creative collaborator.

"If the model can tell you what it needs to get your inputs, to actually asking for your inputs before it does something — that's underexploited... If you go to a studio and you say, make me a 10-second video about a dog jumping in the grass, they're never going to take that deal. They're going to want more specific." — Zach Xia [00:27:41]

User Style Cannot Be Explicitly Defined — It Must Be Inferred From Data

Most personalization approaches ask users to define their preferences. The contrarian research view is that explicit self-description is inherently incomplete and the only reliable method is implicit inference from behavioral data.

"If the way to solve the problem is you really need to define their style and then model their style, that's probably not the approach I would take. You look at the data, you find the distribution that the data shows you... Even at that time, it's still implicit because it's part of the model." — Zach Xia [00:45:36]

3. Companies Identified

Luma

AI video and image generation company. Matt Tancik leads applied research there, working on agentic systems, fundamental research, and video-to-video controls.

"Most recently, we've been doing a lot of work in the video-to-video space because we think that's a very good way to start incorporating pretty precise controls, especially in the time dimension." — Matt Tancik [00:26:14]

Phota Labs

Personalized AI photography company, co-founded by Zach Xia. Specializes in identity preservation for humans and pets, with emerging work on product photography personalization.

"We do a lot of identity preservation, identity consistency... People are actually very willing to generate a new image, put them in different environments, generating AI headshot, generating AI video of themselves using Phota's technology." — Zach Xia [00:10:03]

Adobe / Lightroom

Zach Xia did research that fed into Lightroom. Mentioned as a foundational creative tool and example of how research transitions into creator workflows.

"I know you work at Adobe and then you did a lot of research that later went into Lightroom, which is also a creator tool that creators have been using." — Yoko Li [00:03:01]

Photoshop / Adobe (as legacy tool infrastructure)

Cited repeatedly as the archetype of a complex, deep creative tool — and as both a benchmark and a cautionary tale for the new generation of AI tools.

"You look at your Photoshop, your illustrators, your blenders — if you hop into those programs, they're extremely complex. You can literally have a whole career learning the ins and outs." — Matt Tancik [00:12:02]

Blender

Mentioned alongside Photoshop and Illustrator as a complex legacy creative tool that agents may eventually interact with on behalf of users.

"You look at your Photoshop, your illustrators, you look at your blenders, whatever modality you're interested in." — Matt Tancik [00:12:02]

ChatGPT / OpenAI

Referenced as an example of the trend toward thinking modes in language models, used as an analogy for where creative AI tools are heading.

"We've seen a trend in language models where they've transitioned to thinking modes. If you use ChatGPT or one of these models nowadays without thinking, it just doesn't feel smart." — Matt Tancik [00:24:52]

Claude Code

Referenced (briefly garbled in transcript as "call code") as an example of model-app co-design where the product educates users on best interaction patterns.

"People are used to do everything like VS Code or whatever. And now Claude Code just in a terminal — you can do that code. And it's the best way to sort of use that model because you cut through a lot of other things." — Zach Xia [00:28:51]

4. People Identified

Head of Applied Research at Luma. Co-creator of NeRF (Neural Radiance Fields). Also a practicing photographer and painter. Deep expertise spanning 3D representation, agentic systems, and creative tool design.

"Matt Tansik, who is the head of applied research at Luma. He works on agentic systems, fundamental research, and he was also the co-creator of NeRF." — Yoko Li [00:01:33]

Zach Xia

Co-founder and CTO of Phota Labs. Previously at Adobe where his research fed into Lightroom. Deep expertise in personalized AI, identity preservation, and AI photography workflows.

"Zak Xia, who is the co-founder and CTO of Photo Labs. He works on personalized AI and AI photography." — Yoko Li [00:01:33]

Yoko Li

Partner at a16z. Host of this episode. Frames and steers the conversation toward product, research transitions, and the future of creative tooling.

"In this episode, Yoko Lee speaks with Matt Tansik of Luma and Zak Xia of Photo Labs about AI-generated imagery, creative workflows, personalization." — Yoko Li [00:01:16]

5. Operating Insights

Use Unintended Product Usage as a Roadmap Signal

Both founders independently described moments where users hacked or misused the product in ways that revealed high-priority roadmap items. This is a systematic signal worth building a process around — not just a happy accident.

"There's some users who are trying to hack our system so they can use it for personalization for product photography. They're like, we want this. And so now that's on our roadmap." — Zach Xia [00:22:56]

"People find very interesting ways to use tools. Not just our tools, but our tools in addition to other tools that you never expect it to work... People find crazy ways to get these things to work." — Matt Tancik [00:11:07]

Show Users an Edit Before Asking What They Want

Users cannot answer open-ended "how should we improve this?" questions — but they can react to a concrete example. The operational implication: replace discovery surveys with reactive feedback on a proposed change.

"When you ask the user, how do you want to improve this photo? They don't know. When you make some edits to that photo, now they have opinions on whether they like your edits or not. They know what they don't like." — Zach Xia [00:33:59]

Brand Guidelines as a Stealth Personalization Input

Brands that bring their brand guidelines into a creative tool become dramatically better-served users. This is an underexploited enterprise wedge: structured brand context is a richer personalization signal than any user prompt.

"Seeing how brands would bring in brand guidelines — it's not something we really considered very much. But then once they brought them in, you realize this was an extremely useful source of information for really getting through what a user is interested in." — Matt Tancik [00:22:21]

Collect Satisfaction Feedback Ambiently and Non-Intrusively

For subjective outputs like identity and taste, post-ship evaluation matters more than pre-ship benchmarking. The operational challenge is capturing that signal without friction.

"Can you collect that kind of feedback in an ambient way that's just like non-intrusive way is just so important?" — Zach Xia [00:19:38]

Iteration Speed Matters More Than First-Prompt Quality

Building for professional creative adoption requires designing for iteration cycles, not single-shot generation quality. Tools that assume a final answer from one prompt will always underserve working creatives.

"Part of art is that iteration. Making sure the tools are able to handle that level of iteration is something extremely important. This also makes it very hard to benchmark these things because their output isn't that ground truth in their head." — Matt Tancik [00:09:38]

6. Overlooked Insights

User-Owned Personalization Models Are a New Asset Class

Zach briefly describes an architectural philosophy that is easy to miss but potentially enormous: the personalization layer should be owned by the user, not the platform. This is a fundamentally different model from how Adobe, Google Photos, or any current platform operates — and it implies a future where individuals carry portable creative identity models across foundation model providers.

"Personalization is something the user should own. Like that's the user's model. It'd be the user's model for them. But at the end of the day, they're using that model. They have full control of that... We want them to be able to combine that with any foundation model they like to use. To really disentangle personalization versus foundation is sort of the goal that we have." — Zach Xia [00:42:13]

The investment implication: whoever builds the infrastructure layer for portable, user-owned personalization models — interoperable across Luma, Stable Diffusion, and future foundation models — could become the identity layer of the entire AI creative stack.

Video Identity Preservation Is Being Solved Indirectly via Frame Injection

Zach mentions in passing that users are already discovering a workaround for the well-known failure of video models at identity preservation: generate consistent frames using Phota's image identity technology, then feed those frames into a video model. This is a user-discovered pipeline that neither company designed — and it signals that the real bottleneck in AI video for creators is identity, not motion quality. Any company that solves native video identity preservation owns the professional video creation market.

"People are like, we want this. The video models are so good, so bad at identity preservation. What I can do is I'm going to use Phota to generate frames of those videos. And then I'm going to use a video model to take those frames into a video so that identity is good. Something that we didn't think of at the beginning, but they figured it out." — Zach Xia [00:10:39]