πŸŽ™ Free Webinar: AI that actually grows your small business β€” every Saturday. Save your seat β†’

The End of Token Maxing: What Uber's AI Budget Wake-Up Means for Your Business

  • Token maxing (pushing employees to burn as many AI tokens as possible) is collapsing at big companies like Uber and Cisco... because spend is hard to tie to shipped customer value.
  • Devs feel faster. Finance sees a bill. Product still looks the same. That gap is the whole story.
  • The fix isn't "use less AI." It's route the right task to the right model and measure cost per outcome, not tokens burned.
  • Small businesses: you don't need a frontier model for every email draft. You need a default, a cheap fallback, and one rule for when to upgrade.
  • We're entering the cloud-cost optimization phase of AI... and a whole category of "lower your token bill" tools is coming.

For about a year, some companies treated AI usage like a leaderboard sport. Use the biggest model. Run agents on everything. Rank who burned the most tokens. That era has a name now: token maxing. And the first big tech company to say out loud that it doesn't pencil anymore is Uber.


What Uber actually admitted

Uber's COO Andrew Macdonald put it plainly on a recent podcast: it's very hard to draw a line between heavy token usage on coding tools like Claude Code and useful features actually shipping to riders and drivers (The Indian Express).

Devs inside the company felt the tools were helping. Leadership looked at the pipeline and couldn't see the payoff at the other end. Sound familiar?

The numbers made it worse. Uber reportedly blew through its entire annual AI budget in four months after encouraging staff to use AI "as much as possible," complete with internal usage leaderboards (TechCrunch). The response: monthly caps (around $1,500 per employee per agentic coding tool in some reports) and a cultural shift away from competitive token burn.

Cisco president Jeetu Patel echoed the same tension at the Semafor Tech Summit: at scale, token costs still exceed the value they deliver (India Today). Not "AI doesn't work." More like: the meter is running faster than the business case.

Why bills spiked even as per-token prices fell

Here's the trap. Token prices have dropped. Usage exploded anyway.

SDxCentral's recap of Cisco's tokenomics framing calls it Jevons' Paradox: when each token gets cheaper, you don't necessarily spend less... you run harder tasks. Chain-of-thought reasoning. Agents that spawn sub-agents. Prompts that rewrite themselves before they answer you.

You used to type a question. Now you describe an outcome and the system architects five hidden prompts behind the scenes. Same button. More juice.

That's why "we lowered the price per token" doesn't automatically mean "your CFO sleeps better." Especially when employees default to the most expensive model because it's not their money.

The provider response: cheaper tokens, cheaper models, seat wars

The market is reacting on three fronts:

1. Price cuts at the API layer. The Wall Street Journal reported that OpenAI is weighing significant cuts to token pricing as enterprise buyers push back. Sam Altman has called cost "a huge issue" for businesses. Translation: labs know that if you burn through budget once and get a bad taste, you might switch vendors for the next five years.

2. Cheaper model tiers on purpose. At Google I/O 2026, Sundar Pichai said companies are blowing through annual token budgets by May and pitched Gemini 3.5 Flash as frontier-ish capability at roughly one-third to one-half the cost of comparable frontier models. The pitch isn't "dumber AI." It's don't use a sledgehammer to hang a picture frame.

3. Seat pricing vs. usage pricing. Anthropic's Claude Code and Claude Desktop carry per-seat licenses on top of API burn... a model Dylan's team has felt more acutely than OpenAI's mix of flat ChatGPT tiers plus pay-as-you-go APIs. When your bill jumps because headcount grew, not because output grew, finance starts asking different questions.

If you're still curating which stack fits how you actually work, the AI tools checklist is a sane place to sort signal from subscription clutter.

The counter-narrative: Mark Cuban says worry about disruption, not the meter

Mark Cuban pushed back on the "pinch pennies on tokens" frame. His argument: the bigger risk is AI-native startups displacing incumbents that move too slowly (Benzinga). Every founder who knows AI is hunting ways to rebuild categories from scratch. If you're Uber-scale, maybe you throttle spend today... but if a lean competitor ships faster because they're AI-native from day one, saving tokens won't save the business.

Both things can be true:

  • Incumbents with massive R&D lines need ROI discipline (Uber's moment).
  • Challengers with cash and nothing to protect might still spend aggressively to grab share... then optimize later once they're the incumbent.

Classic disruptor playbook: subsidize, win, raise prices. We've seen it in rideshare. We'll see shades of it in AI.

The cloud bill playbook, but for tokens

When software moved to the cloud, the first wave felt like infinite servers. Then the bill showed up. Now whole companies exist just to lower your cloud spend (and take a cut of savings).

AI is on the same curve. Palantir CEO Alex Karp has talked about enterprises frustrated with labs driving usage for usage's sake... sometimes called tokenmaxxing from the vendor side (TipRanks). His bet: IT will route specific models to specific jobs behind the scenes. Users won't pick GPT-5 vs. Flash vs. Haiku. They'll pick "draft this email" and the router will.

Deloitte's tokenomics guidance calls LLM routing a core enterprise control: match prompt complexity to the minimum viable model, reserve expensive reasoning for high-value work, and stop burning frontier tokens on chit-chat.

That's not theory for Fortune 500 only. Small businesses can do a lightweight version today:

  • Default cheap for summaries, rewrites, and first drafts.
  • Upgrade on purpose for code, contracts, or anything where a mistake costs real money.
  • One person owns the bill... even if that's you on a $20/month plan plus occasional API calls.

Tools like OpenRouter and routers built into dev stacks let you switch models without switching apps. Ask your AI which tasks belong on which tier. Seriously. Meta, but useful.

What "smart spend" looks like on a small team

Token maxing was always silly if the goal was "spend more." The goal was supposed to be learn where the line is. Fair. But most small businesses never needed that experiment. They're in a sweeter spot:

  • A few power users spending real money on the problems that pay.
  • Most of the team in the $50–$100/month band.
  • Several people still at zero because their job isn't text-in, text-out yet.

That's healthy. You don't need autonomous agents on every spreadsheet cell.

Practical rules:

  1. Measure outcomes, not vibes. Did this tool save an hour, prevent an error, or ship something faster? Log it loosely. One line in a notes doc beats a dashboard nobody opens.
  2. Name a fallback before you need it. If Anthropic is down, policy-blocked, or 3x over budget... what's plan B? Same for OpenAI, Google, whatever you standardize on.
  3. Don't build critical workflows on a four-day-old model release. Frontier access can change overnight (export rules, caps, pricing). Abstraction layer. Tested backup. Boring infrastructure.
  4. Teach the team one sentence: "Use the small model unless you know why you need the big one."

Providers know burned budgets create churn. Expect more spend guardrails, routing, and cheaper tiers... not because AI failed, but because the first wave of "use it everywhere" did.

What to do this week

Run a ten-minute audit:

  1. List every AI subscription and API line item.
  2. Circle the top two that produced something you shipped or sold last month.
  3. Everything else: downgrade, share one seat, or kill.

If you're routing across models, Infacto's tools hub collects free utilities (including link builders and planners) so you're not stacking paid dashboards you won't open.

Conclusion

Token maxing is ending where budgets are real: Uber capped usage, Cisco warned on value vs. cost, OpenAI and Google are racing to cut prices and ship cheaper models. The next phase is smart routing... same story as cloud optimization, just faster.

You don't win by burning the most tokens. You win by shipping outcomes for less than they're worth. If your team can't explain which model they used and why, you're not behind on AI. You're ahead of the bill.

Want prompt starters for everyday business tasks while you tune your stack? Browse the AI Prompt Library.


Ask ChatGPT about Infacto Digital