AI, Design, and the Power of Open Models
- 01Open-Weight Strategy as a Business Development Tool, Not Just a Technical Decision
- 02Graphic Design as the Underserved, High-Value Frontier of Image Generation
- 03JSON as the Intermediate Representation Between Language Models and Image Models
- 04Taste as a Moat: Deliberately Avoiding Leaderboard Optimization
- 05Editable, Layered Design
- 06Small Model Size as a Strategic Choice, Not a Limitation
1. Key Themes
Open-Weight Strategy as a Business Development Tool, Not Just a Technical Decision
Ideogram's decision to release open weights was explicitly framed as a partnership and distribution strategy rather than purely a philosophical commitment to open source. By releasing weights, Ideogram gains access to the inference provider ecosystem, chip makers, and enterprise on-prem deployments that a closed API alone cannot reach.
"By releasing the weights, we're actually extending ourselves and working with inference providers, working more directly with large enterprise. They have every ability to customize the models or host it on-prem or optimize it for device. And we would love to work with the best chip makers to really optimize the model, the best inference providers. So this is basically us saying, hey, we are very serious about building the foundation model." [00:02:33] — Mohammad Norouzi
Graphic Design as the Underserved, High-Value Frontier of Image Generation
Norouzi repeatedly frames graphic design — not photorealism or artistic illustration — as the real commercial opportunity. Text rendering, layout control, typography, and brand adherence are where enterprise dollars are concentrated, yet most frontier labs are not focused there.
"Graphic design is everywhere. If you go in a city, you open your eyes, you see billboards, you see storefronts, they all have text. And actually, it's much more important than, I guess, photography is part of graphic design. But graphic design is actually the frontier for a lot of business use cases for storytelling." [00:15:55] — Mohammad Norouzi
JSON as the Intermediate Representation Between Language Models and Image Models
A core architectural insight: language models think in text; diffusion models generate pixels. JSON (and potentially HTML) serves as the structured bridge between the two, enabling precise, reproducible, and editable outputs. This is not just a prompting trick — it reflects a foundational belief about how multi-model pipelines should be composed.
"It's the intermediate representation that we think language models can describe images in that format. And then image generation can happen... The representation needs to be easy for the language model with the particular design that we have right now, which is a language model does some expansion of the ideas, and then the image model takes those expanded descriptions and turns them into images." [00:11:45] — Mohammad Norouzi
Taste as a Moat: Deliberately Avoiding Leaderboard Optimization
Ideogram explicitly avoids optimizing for public benchmarks and leaderboards in favor of internal taste evaluations run by human designers. This is a strategic choice — and Norouzi acknowledges that taste is actually slightly anti-correlated with top leaderboard scores because leaderboards reward conformity to average preferences.
"One element of taste is kind of being, going outside of the norm a little bit and not conforming to the average opinion, which is a little against being on top of the leaderboard... We care about our own internal evaluation. And unfortunately, we see that AI is not very good at doing the actual taste evaluation yet. So we work with designers and we have side-by-side comparisons between different versions of the model as well as other models to really push on the taste." [00:16:42] — Mohammad Norouzi
Editable, Layered Design — Not Flat Images — Is the Next Unlock
The most commercially significant capability Ideogram is working on has not yet been released: editable text and layout control that produces multi-layer, composable design output rather than a single flat image. This is what transforms an image model into a design tool that fits professional workflows.
"What I'm personally most excited about is something we haven't released yet, which is editable text and layout control. I really believe for a lot of design and marketing use cases, we need editable design, not a single flat image." [00:03:34] — Mohammad Norouzi
Small Model Size as a Strategic Choice, Not a Limitation
At 9.3 billion parameters — roughly 9x smaller than state-of-the-art alternatives — Ideogram's open-weight model was intentionally kept small to enable on-device and consumer GPU deployment, with explicit plans to scale 10x–100x once the quality-per-parameter architecture is proven.
"We think now is actually a good time for us to scale. Given the quality of the model at 9.3 billion parameters, you should imagine what if this model is 100x bigger and there are mixture of experts architectures that don't make the model necessarily slower, but they make the model a lot more powerful." [00:19:17] — Mohammad Norouzi
Enterprise Brand DNA Is the Real Customization Opportunity
Enterprises are not just asking for better general-purpose images — they are asking for models that internalize brand guidelines, mascots, style vocabularies, and design rules. The ROI unlocks when the model "understands my brand DNA," which requires annotation-heavy, human-in-the-loop fine-tuning, not just uploading a few images.
"What we've seen over and over again is companies come to us and say, we tried these generic models and they don't meet our design bar. They don't follow our style. They don't follow our brand guideline. And once we train custom models for them, they are like, wow, this understands my brand DNA now. We can use this for design ideation or we can use this for marketing." [00:22:58] — Mohammad Norouzi
Agentic Workflows for Creative Production Are Nascent but Real
Ideogram has already deployed MCP and API-based agentic workflows internally, using them to generate landing page assets in hours. The key unsolved problem is automated evaluation within the loop — agents can produce at scale, but humans still need to curate.
"What's really exciting is when you want to release a new feature, you can go into your agent and then ask it to connect to the API and generate a bunch of images. And then you can go and find the best ones. And in a couple hours, you have your landing page up and running... We need evaluation as part of the loop. We don't want to have to look at every image." [00:30:13] — Mohammad Norouzi
Visual Identity Requires Customization Far More Than Written Communication Does
A sharp and underappreciated observation: in language, brand voice is hard to distinguish at a glance. In visual media, brand identity is instantly recognizable. This structural difference means image model customization will be far more universally adopted across enterprises than LLM fine-tuning has been.
"When you look at visual representation of a brand, you immediately recognize the differences between brands. But if you look at the written communication, can you say, oh, is this Andreessen Horowitz or this is Sequoia? Most people will not be able to immediately look at the text and say. So there is a lot more diversity in the visual world. And that's very exciting for customization." [00:28:42] — Mohammad Norouzi
2. Contrarian Perspectives
Reinforcement Learning Hurts Creative Diversity in Image Models
While RLHF is widely celebrated as the alignment technique that makes models better, Norouzi argues it has a hidden cost for image generation: it homogenizes outputs toward average human preferences, eliminating stylistic diversity. The models that score highest on leaderboards are often the least useful for creative professionals precisely because of this.
"Many different styles are embedded into the model. And if you've seen some of the frontier models that score very highly in the leaderboards, they don't have a lot of kind of design variation. They always produce the same exact look. And I believe that's because they did a lot of reinforcement learning training. We actually have done very little reinforcement learning. So this is a very raw model." [00:33:43] — Mohammad Norouzi
Leaderboards Are a Poor — and Even Counterproductive — Signal for Image Quality
The conventional wisdom is to optimize for benchmarks as a proxy for model quality. Norouzi flips this: external evaluators (including AI judges) are systematically bad at measuring what matters — pixel fidelity, photorealism, and taste — and optimizing for them actively degrades the model's distinctiveness.
"We always care so much about quality, photorealism, and again, text accuracy... We see that AI is not very good at doing the actual taste evaluation yet. So we work with designers and we have side-by-side comparisons between different versions of the model as well as other models to really push on the taste." [00:07:38] — Mohammad Norouzi
Enterprise Will Customize Image Models Far More Broadly Than LLMs
The prevailing narrative is that LLM fine-tuning is the high-value enterprise customization layer. Norouzi argues the opposite for the image domain: because visual brand identity is immediately differentiating in a way written voice is not, image model customization has far broader enterprise adoption potential than LLM customization does.
"I think that actually misses the point. When you look at visual representation of a brand, you immediately recognize the differences between brands. But if you look at the written communication... most people will not be able to immediately look at the text and say. So there is a lot more diversity in the visual world." [00:28:42] — Mohammad Norouzi
The Path to Better Image Generation Runs Through Language, Not Pixels
Counterintuitively, the way to improve image quality is not to make the image model smarter in pixel space, but to make the language model's description of the image more precise before any pixel is generated. The ideal end state is a language model that fully specifies every detail, leaving the diffusion model with a nearly trivial task.
"The recipe for building more powerful models, in my opinion, is making the task as straightforward as possible for the diffusion model. That is, specify the exact details of the image. And so now if you kind of make that extreme, then it becomes the pixels themselves. So the diffusion model doesn't have to do anything." [00:36:52] — Mohammad Norouzi
3. Companies Identified
Ideogram
AI image generation company founded by Mohammad Norouzi, based in Toronto. Released their first open-weight image generation model (9.3 billion parameters), known for superior typography, text rendering, graphic design control, and JSON-structured prompting. Offers custom model training for enterprises and artists at $60/month for consumer tier.
"We are very serious about building the foundation model. And we would like to work with you, wherever you are, whether you're an app developer or a chip maker or an inference provider." [00:02:52] — Mohammad Norouzi
Hugging Face
AI model hosting and open-source community platform. Mentioned as the primary distribution channel for Ideogram's open-weight model release.
"This is just the first release. You're just testing the waters, figuring out how to work with Hugging Face and the open source community, Comfy UI, etc." [00:03:34] — Mohammad Norouzi
Comfy UI
Open-source node-based UI for image generation workflows. Mentioned as a key community integration target for the open-weight model.
"Figuring out how to work with Hugging Face and the open source community, Comfy UI, etc." [00:03:34] — Mohammad Norouzi
OpenAI
Mentioned as a competitor whose image models (including GPT Image) set the benchmark Ideogram is measuring against, and whose internal prompt expansion practices mirror Ideogram's — but without transparency.
"Everybody else does it too. OpenAI does it. Google does it. But then they don't give you the actual input to the model." [00:12:41] — Mohammad Norouzi
Mentioned alongside OpenAI as a closed-model competitor that uses prompt expansion but does not expose intermediate representations to users.
"OpenAI does it. Google does it. But then they don't give you the actual input to the model." [00:12:41] — Mohammad Norouzi
4. People Identified
Mohammad Norouzi
Founder and CEO of Ideogram. Previously a research scientist, now building one of the most technically sophisticated open-weight image generation companies. Known for deep focus on typography, taste, and graphic design as differentiators. Running a very small team producing outsized technical output.
"We have a very tiny team. You see what we were able to produce. It's such a tiny team. And if you want high agency, you know, if you want your work to matter and you want to be part of the academic and open source ecosystem, then this is the perfect time to join us." [00:39:15] — Mohammad Norouzi
Justine Moore
Partner at a16z, co-host of this episode. Focused on consumer and creative AI. Demonstrated sharp product intuition in identifying that image editing and fine-tuning are complementary rather than competing workflows.
"I actually think they don't necessarily have to be competitive. Like some people use image editing as a way of fine-tuning... Others think it's much more efficient and consistent to just fine-tune a model to generate in that style." [00:25:55] — Justine Moore
Yoko Li
Partner at a16z, co-host. Focused on AI and infrastructure investments. Asked the most technically probing questions of the episode, including on JSON representations, model size tradeoffs, and agentic API composition.
"One thing we were always wondering is that this release open source model is so small. It's 9.3 billion parameters. Like previously, a SOTA is probably like 80 billion parameters. It's like 9x difference. How did you do it?" [00:43] — Yoko Li
5. Operating Insights
Use AI Evaluation for Speed, Human Experts for Taste — and Know Which Is Which
Ideogram's internal workflow separates automated benchmarking (fast, cheap, useful for catching regressions) from human designer evaluation (slow, expensive, essential for taste). Conflating the two is a mistake — AI judges systematically fail at aesthetic quality. Any company building creative AI products should build a parallel human evaluation track, not rely solely on automated metrics.
"We work with designers and we have side-by-side comparisons between different versions of the model as well as other models to really push on the taste. So we really care about taste." [00:17:25] — Mohammad Norouzi
The "Brand DNA" Customer Conversation Is a Sales Unlock
Enterprises that arrive thinking they need a generic image model are converted into high-value customization customers the moment they see a model trained on their own assets. The sales motion should lead with a demonstration of brand-specific fine-tuning, not general capability benchmarks.
"Companies come to us and say, we tried these generic models and they don't meet our design bar... And once we train custom models for them, they are like, wow, this understands my brand DNA now. We can use this for design ideation or we can use this for marketing." [00:22:58] — Mohammad Norouzi
Structure Your Annotation Process Around the Model's Vocabulary, Not the Client's
When onboarding enterprise customers for custom model training, Ideogram's annotation team works with the client's design team to map their existing vocabulary — mascot names, style keywords, brand-specific terminology — into the model's prompt structure. This human-in-the-loop annotation step is what separates superficial fine-tuning from genuine brand internalization.
"Our annotation team gets involved and spends a lot of time curating and cleaning data... Each company may have certain mascots who have certain names. And we work with their design team to understand what words they want to use because each team has different set of keywords." [00:25:15] — Mohammad Norouzi
6. Overlooked Insights
HTML May Become the Universal Prompt Language for Image Generation
This was mentioned briefly and almost in passing, but it is architecturally significant. Norouzi revealed that Ideogram is actively debating whether to use HTML — not a proprietary JSON schema — as the intermediate representation for editable design output. The reason is decisive: large language models are already trained on HTML, so alignment is essentially free. If HTML becomes the prompt lingua franca for image models, it collapses the barrier between web design tools, LLM code generation, and image generation into a single unified pipeline — with enormous implications for design tooling startups and anyone building at the intersection of no-code, AI, and creative production.
"It may become more close to HTML, for example. That's okay because, again, large language models are trained with HTML and they know the tokens... It seems like HTML makes more sense just because these large language models have already been trained on HTML as opposed to us introducing a new JSON structure." [00:37:45] — Mohammad Norouzi
Agentic Image Generation Still Lacks the Evaluation Layer — and That Is the Real Product Gap
Norouzi mentioned in one sentence that the agentic loop for image generation is blocked by the absence of automated evaluation — humans still have to look at every image. This is not a minor inconvenience; it is the core bottleneck preventing agentic creative workflows from scaling. The company (or product) that solves automated aesthetic and brand-adherence evaluation for images unlocks the entire agentic creative production market. No one in the conversation paused to identify this as a standalone opportunity, but it is the missing infrastructure layer.
"We need evaluation as part of the loop. We don't want to have to look at every image. And then editing will be part of the agenting interaction too." [00:30:41] — Mohammad Norouzi