One Model Won't Win
Open models are near-frontier and nearly free. The advantage just moved off the model.
For the first time, picking a model is no longer an edge. Open-weight models now land within a few points of the frontier on most benchmarks while costing 50 to 90% less to run, and API prices have fallen more than 90% since 2023. When near-frontier intelligence is available to everyone for pennies, the model stops being the moat. It becomes a commodity input.
When the input commoditizes, value moves to whoever orchestrates it. That is happening in two directions at once, above the model and below it.
Above the model, the aggregator is becoming the intelligence layer. OpenRouter spent two years routing each request to the single best model. This month it launched Fusion, which fans a prompt out to a panel of models, has a judge reconcile their answers, and returns one synthesized result. The panel beats any single model: 69% on the DRACO research benchmark against 65% for the best solo model, and ahead of both GPT-5.5 and Opus 4.8. The routing layer stopped picking the winner and started manufacturing one.
Below the model, serving is becoming its own discipline. A cheap open model is only cheap if you serve it efficiently, and the optimization surface (batching, KV cache, speculative decoding) moves faster than any one team can track. vLLM is the floor, not the ceiling, and the frontier shifts week to week. The edge is no longer which model you run. It is how few GPUs you need to run it.
We think this is where the margin goes. Not to whoever trains the best model, but to whoever orchestrates models best on top and serves them cheapest underneath. We’ve said before that open weights commoditize the base layer and the moat moves up-stack. The model was the product for three years. Now it is the raw material, and the companies that matter are the ones that turn it into something cheaper, faster, or smarter than any single model could be alone.
