The token bill is due: Inside the industry, you must fight to manage the runaway costs of AI

Across the industry, companies are beginning to balk at the cost of artificial intelligence. Uber blew through its entire 2026 AI coding budget by April. Microsoft revoked its developers’ Claude Code licenses months after activating them. A Priceline employee told TechCrunch that a routine Cursor contract renewal came back 4-5 times more expensive.

Although the prices per token has fallen, pushing for more AI adoption and increasingly autonomous agents have driven token consumption higher and higher. Companies that, in early 2025, gorged themselves on all-you-can-eat subscriptions are now scrambling to understand where their money is going, pull back spending, and figure out if they can salvage some ROI from the wreckage of their budgets.

Meanwhile, a market is forming to meet them there. Startups, established vendors and a new standards body are all racing to give businesses the tools and language to track what they use.

“Six months ago, I would have a conversation with a customer and it would be about, ‘What can it do? Is it good enough?'” Alexander Embiricos, OpenAI’s chief operating officer, told TechCrunch at an event in New York City this week. “Our conversations are never about that now. Now the conversations are about, ‘hey, we’re spending so much. What visibility do you have? What auditability do you have? What token controls do you have? What’s the effectiveness of your models?'”

It’s against this backdrop that the Linux Foundation this week unveiled plans for the Tokenomics Foundation, a new standards body that aims to instill the same cost discipline around AI tokens that FinOps did for cloud spending.

“In April and May, I started hearing from companies: ‘Oh my god, we’re 3x over our entire token budget for 2026, and it’s only April,'” JR Storment, CEO of the FinOps Foundation, a project of the Linux Foundation, told TechCrunch. “We started hearing existential crises and the whole conversation shifted from tokenmaxxing and ‘go fast’ to ‘we need guardrails, how do we control this?'”

The cries heard around the tech world followed fervent demands from CEOs pushing their teams to use the best models and move fast. New models released in November such as Anthropic’s Claude Opus 4.5, OpenAI’s GPT-5.1 and Google’s Gemini 3 Pro brought significant improvements to agent tools that have multiplied consumption. That’s how one company reportedly found itself with a $500 million Claude bill after forgetting to set spending limits for employees.

“It’s like the crack cocaine epidemic,” said Chris Reed, senior director of IT finance at Priceline, noting that the company had begun to set symbolic boundaries for certain groups. “They let you try it to get you hooked on it, and now you’re kind of addicted to it.”

Vitaly Gordon, CEO of engineering operations platform Faros AI, said he recently spoke with a CTO who told him, “One of my engineers spent $40,000 on tokens last month, and I really don’t know if I should stop him or if I should go tell everyone else to be like him.”

A survey by Faros in March found that among 20,000 developers, output was increasing, but so were errors and rewrites. Jellyfish, an engineering management platform, similarly found that engineers who used the most tokens were about twice as productive as those who used AI less, but they spent 10 times the number of tokens to get there.

Nicholas Arcolano, head of research at Jellyfish, told TechCrunch via email that spending on AI is exploding in large part because of agentic functions, where spending per developer rises about 18.6x in nine months. Taken together, these statistics make the case for productivity more bleak than the spending suggests.

“Whether extreme spending pays off comes down to the ultimate business value of shipped code (eg, revenue), which most companies still cannot measure,” Arcolano said.

At least some of this measurement problem is the sheer scale at which AI is being used today.

“Tracking cloud costs is a data problem of hundreds of millions of rows per month,” Storment said. “Tracking token costs is a trillion-rows-per-month data problem. You can’t just plug it into any spreadsheet or even basic tool. You have to fundamentally rethink your tooling, your specifications, and your accounting systems to do it.”

At Priceline, Reed is already seeing inconsistencies. He noted issues between a vendor’s reported usage and Priceline’s internal data.

“I started my career in telecom expense management and I see all the same parallels, from telecom to cloud to AI,” he said. “Any time you introduce something new, it’s ripe for billing errors and audit and optimization opportunities.”

A market is starting to form around this issue. There are pure-play companies, like Pay-i, that track, measure and optimize the cost and performance of GenAI investments. Paid, meanwhile, allows developers to track costs, measure usage, and bill users based on actual value instead of subscription fees.

Then there are companies like Jellyfish, Waydev, and Faros AI, all of which provide AI agent monitoring to prove the ROI of developer tools. Storment says most of the 180 vendors in the FinOps Foundation lean toward this area.

Companies with existing distribution are also adding new features to tap into this new market. Ramp has recently moved into AI expense management; Datadog and New Relic have addressed services such as cloud cost management, token-level observability, and GPU monitoring. At the FinOps X conference next week, AWS is expected to introduce new financial management features aimed at enterprise AI spending.

Tiffany Luck, a partner at NEA, believes that token efficiency and observability will likely be added in the “harness or app layer.” She pointed to Factory, a startup that makes AI agents for businesses, which this week launched a model router that automatically selects the right model for each task.

Gordon expects Frontier Labs and other model providers to use OpenRouter-style optimization to drive queries to the cheapest models—a trend already showing up in the Claude company’s bills.

“The financial report for how much you spend on Anthropic, even if you call the Opus model, some of the spending will be on Sonnet or Haiku because they’re smart enough to do that,” Gordan said. “I think this is going to become more and more of a thing.”

But all of these tools are being built without a common language or common definitions for how much a token costs, what it produces, and how to compare spending across vendors. This is where the Tokenomics Foundation hopes to be helpful.

The foundation is building a canonical definition and framework for “tokenomics;” open standards, specifications and metrics for AI token usage and billing; as well as new metrics for AI economics, such as price per intelligence or tokens per watts. It also plans to define metrics across token factory efficiency and spend efficiency. The group plans a formal launch in July and is about to announce more members at the FinOps X conference next week.

“Token economy is fundamentally more abstract and opaque than anything we’ve managed at this scale before,” Nishant Gupta, chief availability officer at Salesforce, said in a statement. “It requires a different operational muscle than the one the industry built for the cloud.”

That said, Goldman Sachs expects global usage of tokens to multiply 24x by 2030. The already over-budget companies need solutions now, and the fund’s first delivery is still months away.

“We may have created a steam engine, but we still haven’t figured out the assembly line,” Gordon said.

According to Arcolano, the smart move is broad, moderate adoption.

“The best ROI comes from moving the broad middle from low to moderate use, not pushing heavy users higher,” he said.

Russell Brandom and Tim Fernholz contributed to this reporting.

When you buy through links in our articles, we may earn a small commission. This does not affect our editorial independence.