
If Claude disappeared tomorrow, I’d be fine. Slower, maybe. But fine.
I built my own agent harness and threw a pile of real issues from my own project at it, running on local models. The results were genuinely impressive — mostly backend, and including some complex refactors. ComfyUI for images, incredible. Smaller models for the small jobs, useful. None of them are Claude. All of them are good enough.
That’s the thing nobody talks about. Everyone’s racing toward frontier — bigger model, harder benchmark, the next tier of reasoning. But most of the market isn’t frontier. It’s the 80% of day-to-day work done reliably, and good enough clears that bar today.
Here’s the shape of it.
Picture every problem you’d hand an AI, laid out by how much intelligence it actually needs. It’s a bell curve. A small tail of trivial stuff on the left, a fat middle of ordinary work, a thin tail on the right of the genuinely hard problems that need the best model going. The money isn’t in the tail. It’s in the middle.
Frontier reaches furthest right. Open source trails behind it. Three things are true at the same time, and none of them lean on the others.
The right tail is finite. Most work doesn’t need elite reasoning — there’s a ceiling on how much intelligence a problem takes, and most problems sit well under it.
Open source keeps eating the useful stuff. Each new release covers a bit more of the work that actually matters, and its share keeps growing. Distillation, synthetic data, copied architecture — the trailing line slides right every few months.
Best isn’t usually necessary. People pay for the best when it counts, but you don’t put a Herman Miller in every cubicle. Once a local model clears your problem at zero marginal cost, a better answer from a metered API has to beat free — a far harder sell than this year’s frontier against last year’s.
So the open-source line keeps sliding right, straight through the dense middle of the curve. That’s where all the problems are, so it eats the market fast. Frontier keeps the shrinking right tail — real, valuable, niche.
The economics are what break. Per-usage pricing works when you have to call the API. It stops working when the 80% runs free on hardware you already own. The labs can cut prices and chase the mass market, but then they’re an inference utility, competing with AWS and Groq on margin per token instead of on how clever the model is. That’s a worse business than the one they’ve got.
I care because I like running local. I’ve got tooling that pulls my issues, MRs and activity and synthesises it into summaries and action items — exactly the kind of thing I want hammering away all day, not metered per call. I like having a plan for when the pricing turns out unsustainable. And I suspect a fair bit of this lands with regulated, compliance-heavy companies running their own LLM clusters — plenty of them were never keen to ship their data off to someone else’s API in the first place.
Watch who’s moving. AMD, Intel, Nvidia and a wave of independents are all chasing local hard. Meanwhile the labs are tightening the screws on cost. Both point the same way.
Good enough wins. It just doesn’t get talked about, because good enough doesn’t trend.