Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of three topics from last week’s The Pulse issue. Full subscribers received the article below seven days ago. If you’ve been forwarded this email, you can subscribe here.
Last week, we covered the slightly perverse trend of “tokenmaxxing” across the industry, where devs run agents with the sole aim of boosting their personal “token stats” in an effort to rank higher on internal token leaderboards, and not be seen as a Luddite who doesn’t use AI tools enough compared to peers.
This week, I spoke with a software engineer at a large company and another at a seed-stage place. Both shared almost identical stories: at their latest all-hands, company leadership expressed concerns about the fast-rising costs of tokens. At both places, token spend has increased by ~10x in the last six months – with no signs of slowing down.
I wanted to find out about this trend, so I talked to devs at 15 businesses. Below is what I learned about what’s happening in workplaces of all sizes. Names are anonymized.
Inside a large SaaS company, most devs use an internal background coding tool for coding. This model defaults to Claude Sonnet, which is the cheaper Claude version. Model selection is not persisted, so devs who prefer working with Opus, for instance, must reselect it on every subsequent startup.
This tool supports all major frontier models such as Sonnet, Opus, GPT, and Gemini. Devs at the company whom I talked to are very heavy users of the tool and have not encountered usage limitations.
“The cost in token spend is off the charts – and leadership has shared this trend with us. They have not said anything beyond showing growth in spend, and mentioning that this won’t be sustainable. So, nothing specific yet, but my sense is that something will have to change. Limits or prioritizing cheaper models, cutting back on hiring? Who knows.”
“We’re monitoring but not restricting. We are spot checking the heaviest users, but we are seeing the business cases working out.
We are offering some guidance on model selection - e.g., turn off the new high-effort setting in Claude. Some users are trying open source models – but open source model usage is a bottom-up initiative, not a top-down one.”
“We have already had to raise our API budget limits multiple times in April. We recently switched to a much higher-effort level for Claude, which significantly increased the cost per PR.
One reason for the cost spike is using state-of-the-art models for demanding tasks. We are using that high-effort setting even for fairly trivial tasks that could have been handled by much cheaper models, or even by lower-effort Claude loops. Despite a few of us pointing this out, leadership has basically said budget is not the concern right now.
I sense that the budget increase has not been forecasted, and we’re in for a reckoning. I suspect the attitude changes once finance and other cost-conscious parts of the org realize we are spending hundreds of dollars per day, per highly-engaged developer. For now, fear of missing out and not wanting to fall behind seems to be outweighing cost discipline.”
“What budget increase? It’s very hard to get a budget for AI here! Claude Code is still not rolled out because $200/month/dev is seen as too high a cost. I talk with people at startups where $1,000/month in spending is totally normal, and it’s night and day here.”
“Some developers are now spending $500 a day (!!) on Claude Code. Practically speaking, this means that employee costs have doubled. Productivity has increased, in my view, but now the bottleneck is code reviews. AI can spit out code quite quickly, but we still have human reviews in place. Leadership encourages using AI for code review, but my team will not blindly trust AI.
The push from AI is coming from the top. This year’s performance review had a section on AI, rating devs by how well they used AI, so this is another reason everyone just uses it as much as they can.”
“Model routing helped keep our costs growing less dramatically. For example, changing the default model reduced cost by 30%. This is our strategy with AI spend, summarized:
Short term: spend, spend, spend! Experiment and use whatever models make sense.
Measure the impact. Measure key outcomes and report on spend, monthly.
When spend vs results diverge: adjust. When our spend increases dramatically, but outcomes don’t follow: see what we can do to adjust the delta. More spend should mean better outcomes. If not, we are doing something wrong.”
“We have Cursor and Claude Desktop, both of which have around 800-1,200 total users. Token usage is growing somewhat unexpectedly. Estimates are being adjusted on the fly; the initial plan to have strict limits (say, $100 per user) is breaking when reality hits, and people exhaust them in 3-5 working days.
Using expensive models is a problem. In regards to Cursor, many devs are defaulting to the most expensive models without realizing that going with Opus gives single percentage gains in intelligence compared to Sonnet, for example, while exhausting their budgets almost immediately.
We are working on blocking/managing out the most expensive models [with Cursor], as going into thousands of dollars per user, per month is not sustainable on our scale. Cursor is a good partner and we’re working with them to switch to a “pooled spend” model where heavy users can tap into a pool of extra spend.
Claude is a similar story. We were at $100 of Claude Desktop limit for everyone, but as we are moving forward, I can see that we would need to go much higher, especially for business-critical use cases.”
“We haven’t had much of an issue. Most folks police themselves for runaway costs; for example, we had someone hit like $10K in a week because they messed up caching, but it was caught and they corrected their harness.
For the most part, we don’t see our high-end folks spending more than ~$1K/week. Now, to be clear, this is not a small amount! BUT it’s already a small subset of the population.
We’re just factoring it into engineering costs at this point: if it’s, say, $2K/month per employee, that’s $24K per year.