Eyes Are a Tax
The browser lost. The terminal won. Here's what that means for software.
Google shut Project Mariner on May 4. It was a bet on the visual-screenshot UX paradigm where AI clicks buttons like a human and humans watch through the browser. It lost.
What beat it is API plus CLI.
Anthropic’s Claude Code has become the fastest-onboarding developer surface the company has shipped, despite running entirely in a text terminal that looks like it was designed in 1985. OpenAI killed Operator as a standalone product and now ships its computer-use model through the API instead. Anthropic’s Quick Mode for Claude in Chrome ships a stripped-down agent loop that swaps structured JSON for single-character commands and delivers a roughly threefold speedup on real browsing tasks. The public benchmarks we’ve found point the same direction: agents that work through APIs and command lines are pulling ahead of agents that work through pixels.
The reason is unit economics, and it gets worse for vision every quarter. A browser-based agent pays to tokenize a screenshot and a DOM tree on every single step, feeding a model big enough to reason over a megabyte of UI noise just to find one button. A CLI agent pays for a few hundred structured tokens per step and gets a structured answer back. The gap isn’t one click – it’s tokens-per-step multiplied across every step of every task. Frontier model pricing for structured reasoning is collapsing faster than vision pricing, and the gap is widening. The cost curve is bending against pixels, and the bend is steepening.
The implication for SaaS is the part most software companies have not yet priced in. If the dominant user of your product over the next decade is a fleet of agents, then the surface that matters for distribution is your schema, not your screen. Salesforce, Notion, Linear. Every category leader is about to be evaluated on a single question: how cleanly can a model consume your product without rendering a pixel. The companies that ship a real, documented, agent-readable API as a first-class product become rails – Stripe proved the model years ago. The companies that gate their primitives behind a UI become the legacy layer that the next generation of startups quietly automates around.
There is a real counter to this view: the browser is also the universal abstraction over the long tail of legacy software that will never expose a clean API. State government portals, hospital admin systems, niche industry tools with three customers and a 2008 codebase. That ~15% of the workflow surface is a real market, and vision agents will serve it for a long time. But 15% is a fallback business. The platform fight is the other 85%.
The bigger compounding effect sits at the infrastructure layer. A human can run one workflow at a time. An agent can run a hundred in parallel, each making thousands of API calls. Our bet is that machine-initiated traffic comes to dwarf human-initiated traffic by an order of magnitude, and the inference cycles, bandwidth, and storage required to support that pattern are nowhere in the current hyperscale capex curve. The most underwritten thing in markets right now is not the model layer or the application layer. It is the silicon, fibre, and power required to carry the load that an agent-native software stack actually generates.
Our view, plainly: the SaaS layer bifurcates. The companies that ship clean, agent-readable APIs become the rails. Everything else becomes vision-agent fallback or a wrapper around someone else’s schema. The infrastructure layer captures the spillover, and the spillover is arguably the biggest single capex flywheel in history.
The next decade of software is being built for readers that do not have eyes. Plan for them.
