Token Budgets for Investment Firms: What's the ROI? And Where's the Limit?
1. Key Themes
Theme 1: AI Spend Has Become a Mandatory Third Budget Line for Knowledge Firms
Token costs are no longer optional overhead — they represent a structurally new category alongside payroll and software, with extreme dispersion across adoption tiers.
"For a decade, the cost of a knowledge worker came in two parts: payroll and software. A third line is now forming on the budget. Tokens."
"Ramp's AI Index…puts the median US firm at $11.38 per employee per month on AI. The top 10% spend about $611. The top 1%, the cohort Ramp calls 'AI-pilled,' spend roughly $7,500 per employee per month. That is a 680x gap between the frontier and the middle."
Theme 2: Falling Token Prices Don't Mean Falling Bills — Consumption Is the Driver
Inference costs have collapsed, yet total AI spend keeps rising because agentic workflows and reasoning models dramatically increase consumption volume.
"The cost to run a fixed level of AI capability has fallen by roughly 280x over two years on Stanford HAI's inference-cost index, yet total bills keep climbing."
"Reasoning models think several times longer per task, agentic tools chain dozens of calls, and Goldman Sachs projects total token usage growing about 24x by 2030. Cheaper per token has not meant cheaper overall."
Theme 3: For Investment Firms, AI Governance Is More Important Than AI Budget Size
The affordability math is easy — the hard problem is controlling which models touch which data with what audit trail, especially given regulatory exposure.
"For almost every firm the conclusion is the same: Tokens are a rounding error against payroll, and they stay that way even at aggressive adoption levels."
"The question that deserves partner-level attention is the governance one: who can use which model, on which data, with what cap and what audit trail."
"Tier your policies by data sensitivity…Run a strict default policy for anything touching deal data, LP information, or material non-public information…FINRA-regulated contexts make this non-negotiable."
Theme 4: Seat vs. Usage-Based Pricing Is the Central Strategic Tradeoff
Two real-world corporate case studies illustrate the full range of the decision: predictability vs. scale-with-value, each with distinct risk profiles.
"Uber rolled Claude Code to roughly 5,000 engineers and exhausted its entire 2026 AI coding-tools budget within four months…The response was a hard cap of $1,500 per employee per month, per tool."
"Microsoft went the other way…canceling Claude Code licenses…redirecting engineers to GitHub Copilot CLI, in a move reporting framed as cost-certainty driven."
"Keep individual assistant use on seat plans…Move agentic workflows and automations to metered API behind a gateway, where the value is highest and the controls need to be tightest."
Theme 5: Productivity ROI Clears the Bar Easily, but Self-Reported Gains Are Inflated
Multiple rigorous studies show real time savings; the risk is overclaiming, not underperforming.
"The St. Louis Fed found generative AI users saved 5.4% of work hours, about 2.2 hours per week, with daily users saving four or more. The LSE Inclusion Initiative and Protiviti put the average at 7.5 hours per week per user, worth roughly £14,000 per employee per year."
"A top-decile budget of $611 per person per month is under 4% of a loaded engineer's cost. To break even, it needs to return roughly 1.5 hours of that person's time per week."
"The risk on this side is over-claiming. Self-reported gains around 40% run far above measured gains around 5%."
2. Contrarian Perspectives
Perspective 1: Setting a per-employee token budget on day one is the wrong first move.
The conventional instinct — cap spend before you understand spend — is backwards. You need observation data first, or your caps will be either throttling or useless.
"The instinct to set a per-employee token budget on day one is the wrong first move. Baseline first, govern the plumbing, then cap."
"Phase one, months 1 to 3. Baseline before you cap. Run a controlled free-for-all, route everything through one gateway…A month of observation beats a guessed budget."
Perspective 2: Today's token economics are partly artificial — workflows priced at current rates may be fragile.
Most firms are building cost models on subsidized pricing from labs competing for market share, creating hidden fragility in their AI-dependent workflows.
"Today's token prices are partly subsidized by frontier labs competing for share. A workflow whose economics only clear at current prices is fragile if prices rise after the model providers go public."
Perspective 3: "Shadow AI" spending means official budgets dramatically undercount actual AI consumption.
Standard per-employee benchmarks are understated because a large share of AI spend flows through personal cards and never enters corporate reporting systems.
"The per-employee figures blend seats, API usage, and bundled SaaS, so they undercount 'shadow AI' bought on personal cards, which Menlo Ventures estimates at close to 40% of application AI spend."
3. Companies Identified
Ramp Description: Corporate card and spend management platform Why mentioned: Source of the most cited AI spend benchmarks in the article, tracking real observed corporate spend across cohorts
"Ramp's AI Index, which tracks observed corporate-card and bill-pay spend, puts the median US firm at $11.38 per employee per month on AI."
Uber Description: Global ride-sharing and technology company Why mentioned: Cautionary case study on runaway agentic spend — exhausted entire 2026 AI coding budget in four months with Claude Code
"Uber rolled Claude Code to roughly 5,000 engineers and exhausted its entire 2026 AI coding-tools budget within four months…The response was a hard cap of $1,500 per employee per month, per tool."
Microsoft Description: Global enterprise software and cloud company Why mentioned: Contrasting case study — chose flat-rate seat pricing over variable token billing for cost predictability
"Microsoft went the other way…canceling Claude Code licenses in its Experiences and Devices division and redirecting engineers to GitHub Copilot CLI, in a move reporting framed as cost-certainty driven."
Mercor Description: AI-powered hiring platform Why mentioned: Extreme case of token-over-headcount spend — cited as evidence of frontier AI consumption patterns
"Mercor's CEO has said the company spends more on tokens for internal agents than on headcount."
Nvidia Description: Semiconductor and AI compute company Why mentioned: Executive statement that internal compute spend now exceeds team salary spend — signals hardware/compute layer becoming dominant cost
"An Nvidia executive has said compute now exceeds the salaries of his team."
FJ Labs Description: Prolific early-stage venture fund Why mentioned: Highlighted as a case study for agentic fund operations using Vessel to automate LP updates and reporting
"FJ Labs runs their entire LP operations on Vessel: automated updates, on-demand reporting, and agents that handle the work so their partners don't have to."
Vessel Description: Agentic fund operations platform for VC and PE Why mentioned: Sponsored the article and cited as infrastructure for automating investment firm operations
"Vessel – Agentic fund operations for VC and PE firms"
Goldman Sachs Description: Global investment bank Why mentioned: Source of a major forward projection on AI token consumption growth
"Goldman Sachs projects total token usage growing about 24x by 2030."
GitHub (Copilot) Description: Microsoft-owned software development platform Why mentioned: Cited as evidence of measurable productivity gains from AI coding tools; also the platform Microsoft redirected engineers to
"GitHub Copilot lifted weekly pull requests by about 26%"
LiteLLM / Helicone / Portkey Description: AI gateway and proxy tools Why mentioned: Recommended as governance infrastructure for investment firms managing multi-model API spend
"A proxy such as LiteLLM, Helicone, or Portkey gives one endpoint, a virtual key per user, per-key budgets, and call-level attribution across providers. This is the highest-leverage piece of plumbing for control."
RouteLLM Description: Research project / model routing framework Why mentioned: Cited for its finding that intelligent model routing can cut costs dramatically without significant quality loss
"The RouteLLM research showed an 85% cut while preserving about 95% of flagship quality."
4. People Identified
Andre Retterath Description: Author of the Data Driven VC newsletter; investor Why mentioned: Author of the article; runs the DDVC community focused on AI and data-driven investing
"Hi, I'm Andre and welcome to my newsletter Data Driven VC which is all about becoming a better investor with data and AI."
5. Operating Insights
Insight 1: Use a gateway as the single highest-leverage control mechanism.
Routing all AI calls through a proxy (LiteLLM, Helicone, or Portkey) provides per-user attribution, per-key budgets, model allow-lists, and anomaly detection in one layer — especially critical for investment firms managing sensitive deal data.
"A proxy such as LiteLLM, Helicone, or Portkey gives one endpoint, a virtual key per user, per-key budgets, and call-level attribution across providers. This is the highest-leverage piece of plumbing for control."
Insight 2: Anchor token budgets to loaded employee cost, not a flat dollar figure — and treat agents like new hires.
Role-based budgets (e.g., ~10% of loaded monthly cost as a hard cap) are both ROI-positive and appropriately scaled. Agents should be onboarded with scoped permissions, spend limits, and a recommend-then-approve posture.
"A cap under roughly 10% of loaded cost is almost always ROI-positive."
"Treat each agent like a new hire. Onboard an AI agent the way you would onboard a junior analyst: scoped permissions, a playbook, a spend limit, and recommend-then-approve rather than act-autonomously for anything expensive."
Insight 3: Intelligent model routing cuts costs 60–85% with minimal quality loss.
Defaulting to the cheapest model that clears the quality bar — and escalating only on evidence — is the most durable optimization strategy once baseline data is in hand.
"Intelligent routing cuts cost per request by 60% to 80%, and the RouteLLM research showed an 85% cut while preserving about 95% of flagship quality."
6. Overlooked Insights
Insight 1: The FinOps Foundation now covers nearly all large enterprises — but investment firms are behind.
The FinOps Foundation, which tracks and standardizes AI spend governance, has gone from covering 63% to 98% of surveyed organizations in a single year. Investment firms are implicitly the laggards, since the article notes that "public benchmarks for AI spend inside investment firms is almost non-existing." The governance playbook exists — it's just not yet being applied in this sector.
"The FinOps Foundation, whose practitioners now manage AI spend at 98% of surveyed organizations (up from 63% a year earlier), has converged on a phased approach."
Insight 2: Microsoft 365 Copilot's email efficiency gains may be the most underrated productivity ROI for investment professionals.
Buried in a list of productivity statistics, the email-time reduction from M365 Copilot is striking for partners and investment professionals who live in their inboxes — and it comes from a flat-seat product, making it easy to deploy and budget.
"Microsoft 365 Copilot users spent 3.6 fewer hours per week on email."