AI Is Turning Into Expensive Infrastructure: Where the New Pricing Is Headed

We run several providers at once: ChatGPT, Claude, Chinese models, local models on our own hardware. Not because we want everything at once, but because different jobs call for different tools. And over the past few months, the picture has been shifting faster than the pricing pages get updated.

This week Anthropic shipped Claude Fable 5, its strongest model yet. On the Pro, Max, and Team plans they opened it up for free — but only through June 22. From the 23rd, subscription access moves over to paid credits. They’ve promised to fold it back into the normal limits once they have enough capacity, but there’s no date attached, and a promise costs nothing.

One pricing change on its own is no big deal. But it’s not the first signal like this, and behind it you can see a broader move we’ve been watching for months now: serious work with the best models increasingly means paying for it separately.

For us this isn’t a complaint about greedy vendors, and it’s not a reason to panic. It’s a normal stage in how a market grows up. The venture playbook — subsidize first to win reach, monetize later — is working exactly the way it’s supposed to. Some people just assumed the free ride would last a lot longer, and it probably won’t. Below we’ll walk through what changed over the past six months, why it isn’t a fluke but a shift in the model, and what we’re doing about it ourselves.

Six Months Ago vs. Now

Just three or four months back, a $20 subscription was enough for real work: writing, coding, testing ideas. We were closing out our daily tasks on that plan without a second thought. The $100 plan came with limits so generous you’d struggle to hit them. The free tier let you drop in, knock out a couple of emails, sanity-check a thought. Access to strong AI was cheap, and you barely had to count your requests.

Now it looks different — and not only at Western companies. Twenty dollars gets you a taste of the top model: log in, look around, burn through your window. For anything sustained, OpenAI nudges you toward the $100 and $200 plans, and even there the limits make themselves felt. For most providers, the free tier has become a demo where the meter runs out on your first heavy task.

ChatGPT in particular is changing right in front of you. A couple of months ago you could easily grab a few starter months on favorable terms. Now OpenAI has clamped down hard on access: phone-number verification, account linking, new rules for getting a trial. This isn’t random fussiness — it’s a deliberate closing of the free front door.

With Chinese services like GLM, the limits used to be hard to exhaust; now prices have gone up and the five-hour request windows have visibly tightened.

Against that backdrop, DeepSeek stands out as a relatively unusual and cheap option: it sells access by the token, not by subscription, and the per-token price is very low. The situation itself is telling. Six months ago, paying per token felt pointless; now people are genuinely happy about cheap tokens. That’s the paradigm shift, live and in action.

You could chalk it up to greedy vendors or a temporary capacity crunch. But we read it differently: what’s changing isn’t how generous the companies feel — it’s the economics of the entire industry. The era of handing out the best models almost for free, to win reach and gather data, is ending. Now every request has to pay back the server capacity behind it, and that gets baked into the price.

What the Real Shift Is

The models keep getting smarter, while building products on top of them gets more expensive. It used to be that almost anyone could experiment without watching the meter: firing a task at a model just to see what came out cost nothing. Now every request to a flagship model through the API costs real money, and in a subscription it runs straight into a limit.

For us, as a team that builds AI products, this means something simple: cheap access to the top tier is closing off. Without some investment — modest, but real — you can no longer code endlessly or churn through tasks on the strongest models. It’s not the limits that are changing, it’s the underlying economics of working with AI.

And here’s the thing: the price of a token itself is dropping. As of early 2026, the blended price per million tokens fell by roughly two-thirds over the year. But the total bill is going up, because consumption is growing faster. This is the classic Jevons paradox: a resource gets cheaper, people use more of it, and aggregate spending climbs. AI isn’t getting more expensive per unit — it’s getting more expensive to operate.

Why This Isn’t an Accident

If you stop staring at one Anthropic plan and look at the whole picture, it becomes clear: this is neither a vendor conspiracy nor a passing shortage. It’s several tectonic shifts arriving at the same point.

Hardware costs money. Inference at scale needs chips, data centers, power, cooling. In 2026 the hyperscalers are pouring hundreds of billions into infrastructure. Someone has to recoup that, and the bill eventually lands on the user.

Training new models is getting pricier. Fresh, high-quality training data is running short, and each new generation of models takes an order of magnitude more resources than the last. Giving the result away for free stops making economic sense.

AI salaries are climbing. Demand for people who can train and run models outpaces supply. That gets priced into the product.

Demand outstrips capacity. There are more users and requests than there are servers. When a resource is scarce, it gets expensive — that holds in any industry.

The venture model is turning toward profitability. ChatGPT, Claude, and projects like them ran on the classic venture script from day one: subsidize access for the first few years to win reach and collect data, then gradually switch on monetization. What we’re seeing now isn’t a breakdown — it’s the move into phase two. We saw this coming.

Add these up and it’s clear the trend won’t reverse next month. This isn’t a temporary tightening — it’s the industry stepping into a different economic model.

Where This Is Headed

Look at the overall direction rather than the individual price tags, and a few lines come into focus.

The free window has become a sampler. The days of companies handing out cheap access to win reach are ending. From here on, every request to a strong model will hit someone’s bill — either the user’s, or the provider’s, who folds it into the subscription price.

Access is splitting along money lines. Base models and last year’s models stay in the cheap plans as handy assistants. The top tier is increasingly available only for an extra fee: credits, premium plans, the API. This isn’t a temporary inconvenience — it’s a two-tier market taking shape, with mass-market AI and professional AI running on different economics.

A throwaway prompt now costs money. The “I’ll just toss the task in and see if it works” approach stops paying off on flagship models. The tool is no longer a sandbox for free attempts. That changes behavior: you have to think about what a request is spending before you send it.

Orchestrating models becomes a skill. When the top tier costs money and base models are cheap, knowing which model to use for what stops being a technical detail and becomes a direct competitive edge. Don’t crack a nut with a sledgehammer: a cheap base model handles the routine, and the strong one gets pulled in only where it’s genuinely needed. This isn’t penny-pinching — it’s managing the resource intelligently.

Access to the best models is turning into a line item. From here, the winner isn’t whoever has the strongest model unlocked — it’s whoever understands where it’s actually needed and where a cheap base model will handle the routine.

What We’re Doing About It

For us this isn’t theory. Some of our projects deliberately run on local models and on-prem — not because we’re against the cloud, but because at volume, predictable cost matters more than access to the single strongest model.

Paker, our local knowledge base for employees, and Rubi, our AI solutions for manufacturing companies, both run on the client’s own hardware. The data never leaves the company’s perimeter, and there’s no token bill that grows with every request. That’s an architectural decision made with this exact trend in mind: inference cost at volume eats your savings if you don’t design for it from day one.

On other projects, we count the cost per operation, not the cost of a subscription. What one parsed tender costs (Torgi). One prepared document set (as-built docs). One processed visualization (Charmonye). And what happens to that cost as volume grows. Without that arithmetic, it’s easy to end up where the prototype is cheap, the demo dazzles, and running it at volume eats every bit of the savings. We’ve written about that separately.

The takeaway isn’t that AI got too expensive. It got expensive in a different way. It used to be expensive to get in and try. Now it’s cheap to try and expensive to run at volume. Whoever counts the cost per operation, instead of the cost of a subscription, sees this earlier and builds it into the architecture rather than into an emergency budget.

Instead of a Conclusion

When we first started working with AI in earnest, the question was “which model should we pick.” Now the question has changed: “what will one operation cost at volume, and which model is good enough for it.” This isn’t a temporary blip in the pricing, and it’s not cause for panic. It’s the industry entering a different economic model — and that’s perfectly normal. You have to account for it up front now — and that’s part of the work, not background noise.