The Real Reason AI Costs Keep Rising

1. Key Themes

Theme 1: The Jevons Paradox is Driving AI Spend Through the Roof

Token prices have collapsed 600x in six years, yet total AI spend keeps rising — a textbook replay of the 19th-century Jevons Paradox (cheaper coal → more coal consumed). Cheaper inference didn't reduce bills; it made expensive agentic architectures economically rational.

"A unit of inference that cost around sixty dollars per million tokens when the GPT-3 API launched in 2020 now goes for about a dime on the economy tier. That is a 600-fold drop in under six years, faster than Moore's Law ever moved silicon."

"When inference is costly, you write a prompt and take an answer. When it is cheap, you build an agent that loops... Every one of those steps burns tokens, and the loop can run dozens of times for a single request."

Theme 2: The "Free-Money Era" of AI Is Over — ROI Accountability Has Arrived

After two years of unchecked AI adoption mandates, CFOs and operating chiefs are now pulling back, installing caps, and demanding justifiable returns. The substitution thesis (AI replaces labor at lower cost) is breaking down as model bills in some organizations have surpassed the salaries they were meant to replace.

"The same executives who pushed adoption are now installing tiered access, mandatory efficiency reviews, and hard caps on who is allowed to spend."

"A Nvidia vice president flagged the problem with that math out loud. In some organizations, compute costs have already passed human labor costs."

"A senior technology executive at a major financial firm put the mood in a single line to the Wall Street Journal, speaking anonymously. The free-money period for AI is over."

Theme 3: Token Observability Is a Real but Limited Investment Category

A new FinOps subcategory is rapidly forming around token attribution and cost tracing. Capital is pouring into tools that trace every model call to the user, agent, and workflow that triggered it. But the article argues this is scaffolding, not the endgame.

"At the FinOps industry's flagship 2026 event, the dominant theme got branded the 'Great Token Panic.'"

"One platform demonstrated catching a single employee's surprise 76,000 dollar token spend within minutes of it happening."

"The measurement layer is scaffolding for whatever replaces the token entirely, not the building."

Theme 4: Outcome-Based Pricing Is the Inevitable Endgame

The article argues the token is a transitional pricing unit, and the real prize goes to whoever abstracts it away entirely — selling a resolved ticket, a reviewed contract, or a processed claim at a fixed price, absorbing token variance internally.

"The winner will not be whoever measures token-to-outcome best. It will be whoever stops selling tokens and starts selling the outcome itself."

"The token was never going to survive as the thing you buy. It was the meter on the way to a market that sells finished work, and the companies printing the tokens will be the first to stop counting them."

Theme 5: Token Budget Control Is the New Corporate Power Struggle

Control over the token budget has become a proxy for organizational power, replacing headcount as the status marker of senior executives. This reframes what looks like a finance dispute into a structural power realignment inside enterprises.

"Whoever controls the token budget controls the AI equivalent of org scope, and a thirty-year instinct does not surrender quietly."

"The managers who treat the token budget as someone else's spreadsheet will wake up reporting to the ones who learned to allocate it."

2. Contrarian Perspectives

Perspective 1: Cheaper AI Costs More — Not Less

The consensus assumption is that falling token prices reduce AI spend. The article inverts this entirely: price drops stimulate demand for architecturally more expensive systems (agents, loops, retries), causing total bills to rise even as per-unit costs fall.

"Cheaper tokens did not just mean more of the same usage. They made an entirely more expensive way of computing viable... the token multiplier from that complexity swamps the per-unit savings." "The budgets broke for a subtle reason. Cheapness made expensive behavior rational, and everyone adopted it in the same quarter."

Perspective 2: Token Observability Tools Won't Solve the Core Problem

The market is betting heavily that better instrumentation fixes the AI cost crisis. The article argues this is structurally wrong: token value is inherently context-dependent and cannot be fully resolved by attribution, no matter how granular.

"The value of a token is irreducibly contextual. The same trace is signal in one workflow and waste in another, and no amount of attribution fully resolves that." "Better instrumentation catches anomalies and trims obvious waste... But anomaly detection is a smoke alarm. It tells you something is burning. It cannot tell you whether the fire was worth lighting."

Perspective 3: OpenAI and Anthropic's Price Cuts Are a Land Grab, Not Customer Generosity

The narrative around forthcoming token price cuts frames them as favorable to buyers. The article reframes them as a deliberate pre-IPO demand stimulus designed to lock in consumption before public listings.

"OpenAI is weighing deep cuts to what it charges per token, expecting Anthropic to follow. Both are filing for the largest tech listings in years." "Cut the price, grow the volume, and reset every customer's blown budget right before handing the SEC a filing that needs to show durable demand. The favor to the buyer is the land grab for the seller." Supporting data: demand is elastic — roughly a 0.29% usage drop for every 1% price increase, meaning price cuts will meaningfully stimulate consumption.

3. Companies Identified

Company	Description	Why Mentioned	Key Quote
Uber	Ride-sharing platform	Blew through its annual AI coding budget in ~4 months; also an example of AI delivering real value (11% of live backend code from agents)	"Uber drained its annual AI coding budget in roughly four months, and its operating chief said the spending is getting hard to justify against the returns he can actually see." / "About 11 percent of the live backend code Uber ships now comes from AI agents handling ride matching, pricing, and bug fixes."
Amazon	Tech/e-commerce giant	Pulled an internal AI usage leaderboard that had gamified token consumption among engineers	"Amazon pulled an internal leaderboard that had turned AI usage into a game among engineers."
Microsoft	Enterprise software	Canceled a swath of internal AI coding subscriptions, signaling ROI skepticism even internally	"Microsoft canceled a swath of internal coding subscriptions."
Blackstone	Private equity/investment firm	Cited as evidence that real AI spend is surging across portfolio companies	"Blackstone said model spending across its portfolio companies rose fifteenfold in a single quarter year over year."
OpenAI	AI lab	Weighing major token price cuts ahead of IPO; positioned as a future outcome-based pricing player	"OpenAI is weighing deep cuts to what it charges per token, expecting Anthropic to follow."
Anthropic	AI lab	Expected to follow OpenAI's price cuts; co-filing for large tech listings	"Both are filing for the largest tech listings in years against revenue run rates that have gone near vertical."
Lovable	AI app-building platform (sponsor)	Presented as a model for outcome/abstraction-based AI — hides token machinery, sells results	"50M projects already run on it, 4 in 5 built by non-technical founders."

4. People Identified

Person	Description	Why Mentioned	Key Quote
Jensen Huang	CEO, Nvidia	Named the Jevons-style dynamic publicly — cheaper tokens drive more spend, not less; classified tokens as COGS alongside energy and payroll	"Cost per token keeps falling while total AI spend keeps climbing, because cheaper inference pulls in demand that did not exist before. He started filing tokens under cost of goods sold, the same budget conversation as energy and payroll."
Ruben Dominguez	Author, The AI Corner	Newsletter author and analyst synthesizing AI cost dynamics	Article author

5. Operating Insights

Insight 1: Define the Business Outcome Before Turning on the Meter

The MIT NANDA study across hundreds of deployments found 95% produced no measurable profit impact. The 5% that worked shared one discipline: they defined outcomes first, targeted back-office operations over demos, and refused to scale anything that hadn't already paid for itself.

"Value showed up only where someone defined the outcome before turning on the meter."

Insight 2: Coding Tool ROI Is Overstated — Recalibrate Your Business Case

The celebrated productivity gains from AI coding tools shrink significantly once full token costs (not just seat licenses) are counted. The real multiplier is closer to 1.6x, not the 10x sold to budget committees.

"On coding tools, the celebrated returns shrink to roughly 1.6 times once you count the full token cost instead of the seat license alone. Still positive. Nowhere near the tenfold story that sold the budget upstairs."

Insight 3: Reframe Your Career Strategy Around Token Allocation, Not Team Size

For operators and managers, the article issues a direct warning: the executives who will accumulate power in the next cycle are those who learn to direct AI resources at hard problems — not those who protect headcount.

"The career hedge stopped being the size of the team you protect. It became the ability to point a swarm of agents at a hard problem and come back with a result the business can actually price."

6. Overlooked Insights

Insight 1: Outsourced Work Is the First Line of Fire — and the Cleanest Benchmark

The article briefly notes that AI spend comparisons land on outsourcing contracts first because they're already priced in finished units (cost per ticket, per claim, per invoice). This makes BPO and outsourced operations the immediate and most legible substitution target — a market shift that has major implications for the BPO industry and enterprise procurement strategies.

"A business process contract is already priced in finished units. Cost per ticket, per claim, per invoice, per reviewed contract. That makes it the easiest place to stage the comparison between a human and an agent."

Insight 2: Only the Labs Can Execute Outcome-Based Pricing at Scale

The article's conclusion that outcome-based pricing is the endgame carries a buried structural implication: only model providers (OpenAI, Anthropic) have sufficient visibility into cost-to-serve to price fixed-cost outcomes profitably. Application-layer companies lack the margin control to absorb token variance — creating a potential competitive moat for labs that move in this direction and a structural vulnerability for the middleware layer.

"That move solves the customer's measurement problem by swallowing it whole, and only the labs control cost-to-serve well enough to price it."