The Return of Marginal Cost

Everyone is arguing about whether AI kills software development. The bigger question is whether it has killed the way we pay for it.

Jun 12, 2026

I have agents running at home, on a mini PC under my desk, because I really wanted to understand the token usage and what the cost would be on different models. OpenClaw runs open models on hardware I own, so there is no invoice, only a token meter. I have spent a lot of time watching that token meter, and what it taught me is this: an agent does not work like software. It works like an employee who is only paid overtime, where every minute costs more than the minute before, and the clock does not stop until the job is done.

That is not a metaphor, it is the maths. At every step it takes, an agent re-reads everything that came before, the whole accumulating conversation along with its internal thoughts, before its next move. As the context grows, the cost of each step grows with it. Five steps in and the fifth step is not costing what the first one did, it costs more. Ten steps in and it costs more again. The bill does not add up, it compounds.

You cannot see any of that from inside a subscription. Indeed, not seeing it is the whole point of a subscription. And for the last two years, two things have trained us not to look. The first is twenty years of SaaS, which taught every budget holder that software is a fixed monthly cost, per user, per month, predictable and flat. The second is the early hype of AI, which taught everyone that intelligence is cheap, practically free, bolted on to the tools you already have.

Neither is true for agents. Last week I wrote about the frontier labs withdrawing the subsidies that made AI look affordable. This week I want to go further, because the subsidy was hiding something worse than a future price rise. It was hiding a cost model that makes flat pricing structurally impossible.

Look at what Anthropic did with Fable 5. It launched its most capable public model on the ninth of June, included it in the Pro, Max and Team plans, and announced from day one that it would move to usage credits after two weeks. Anthropic knew before the public ever saw the model that it could not survive inside a flat subscription. This week I tested Opus 4.8 and Fable 5 on the Pro plan, and can confirm they went through my allowance faster than the first round of drinks at the pub. After that you pay API rates. The best model the public can buy was never going to stay inside a flat monthly price. Anthropic did not discover this, it designed around it.

Fable 5 is not an outlier, it is the pattern arriving. Per-user, per-month pricing worked for twenty years because software had almost no marginal cost. Once you had built it, one user or a thousand, once a day or fifty times, it cost you nearly the same to serve. The cloud kept the compute cheap and crucially, predictable. So you priced for access and everyone got used to the flat fee.

Agents change that. Every action burns tokens that cost real, variable money. The harder the task, the longer the context, the more the agent had to think, the more tools the agent ran, the higher the bill. Inference has put marginal cost back into software, and not the gentle, linear kind. The overtime kind, where the late minutes cost more than the early ones and nobody told the customer. Industry estimates suggest that inference costs dropped 280-fold in two years while total AI spending rose 320% over the same period. Gartner put it plainly in March: do not confuse the deflation of commodity tokens with the democratisation of frontier reasoning.

Now look at what the vendors are actually selling. Most of the AI-powered products that arrived in the last two years are fundamentally orchestration layers sitting on top of the same handful of frontier models, Claude, GPT, Gemini, wrapped in a branded interface and sold at a flat monthly rate. Legal tools, sales tools, recruitment platforms, customer service bots. Different industries, different logos, the same models underneath. Klarna built its entire customer service on OpenAI and handled two-thirds of all chats with it within a month. Different logo, same model.

Every one of them is now shipping agents. Salesforce calls Agentforce a digital labour platform and pitches its agents as digital employees you hire by the click. Others are quieter about it but doing the same thing, bolting agentic workflows into the seat you already pay for and calling it a feature upgrade. The promise is a tireless new colleague for the price you are already paying.

The maths does not support it. Underneath every flat price is an overtime bill the vendor cannot predict, because on the builder platforms they now sell, you decide what the agent does. You set the task, you choose the complexity, you feed it a hundred-page contract or a ten thousand-row spreadsheet. You author the workload. The vendor funds it. Neither of you knows the bill until the job is done, and the vendor learned that before you did.

None of this depends on which model the vendor chose. The token bill compounds with every step the agent takes, on any model, at any price point. The model choice changes the rate on the meter. It does not switch the meter off. So the market is splitting, and you can already see it happening. Cursor, the AI code editor, includes generous agent usage on its $20 Pro plan, but only when Cursor picks the model. It can afford to be generous because it routes to cheaper ones. Choose the frontier model yourself and it burns through a $20 credit pool, after which you pay API rates. Same tasks, same agent. The only variable is which model runs it.

And behind all of it, open-weight models are growing fast. The open-source LLM market hit $21 billion in 2025 and is expanding at 34% a year, with on-premises deployment growing at 29% as organisations chase data control and predictable costs. This is not a hobbyist movement, it is a real option for vendors managing their own inference bill and for buyers who want models on infrastructure they control.

The honest picture has three positions, and all of them are legitimate. Frontier capability on a meter, where you get the best model and pay for what you use. A hybrid, with a predictable base and usage charges beyond it. Or a flat fee on a capped tier, where the model may not be the newest but the cost is bounded. The question is not which position is right. It is whether you chose yours or whether you are aware of and comfortable with which one your vendor chose.

If you are buying an agentic product today on a flat monthly fee, ask the vendor one question before you sign: how are you handling the rising cost of inference, and what does your pricing look like in twelve months? If they cannot answer that clearly, they either do not know or do not want to tell you, and neither is a reason to sign.

Anyone who was sold an agent as a free colleague is about to get their first honest timesheet.

I write about AI, cybersecurity, and technology every Friday. Subscribe to get it in your inbox.

Sources

1. Anthropic, “Claude Fable 5 and Claude Mythos 5,” 9 June 2026. anthropic.com/news/claude-fable-5-mythos-5

2. Klarna, “Klarna AI assistant handles two-thirds of customer service chats in its first month,” 27 February 2024. klarna.com/international/press/klarna-ai-assistant-handles-two-thirds-of-customer-service-chats-in-its-first-month/

3. Bloomberg, via Entrepreneur, “Klarna Is Hiring Customer Service Agents After AI Couldn’t Cut It on Calls, According to the Company’s CEO,” 9 May 2025. entrepreneur.com/business-news/klarna-ceo-reverses-course-by-hiring-more-humans-not-ai/491396

4. Gartner, “Gartner Predicts That by 2030, Performing Inference on an LLM With 1 Trillion Parameters Will Cost GenAI Providers Over 90% Less Than in 2025,” 25 March 2026. gartner.com/en/newsroom/press-releases/2026-03-25-gartner-predicts-that-by-2030-performing-inference-on-an-llm-with-1-trillion-parameters-will-cost-genai-providers-over-90-percent-less-than-in-2025

5. Cursor AI pricing and Auto mode routing. NxCode, “Cursor AI Pricing 2026: Free vs Pro vs Business,” March 2026. nxcode.io/resources/news/cursor-ai-pricing-plans-guide-2026

6. Salesforce Agentforce: “Digital Labor Platform.” salesforce.com/agentforce

7. Technavio, “Open-source LLM Market Growth Analysis - Size and Forecast 2026-2030,” May 2026. technavio.com/report/open-source-llm-market-industry-analysis

Jonathan Freedman

Discussion about this post

Ready for more?