The Inference Economy
Five venture opportunities in the inference economy
Each week, we share a small collection of ideas that shaped our internal thinking. Inspired by experiments like USV’s Librarian, this series is powered by an AI assistant that helps synthesize recurring themes from our discussions, alongside our own reflections.
Inference will 1000x. Even as AI power users, we’ve been reflecting on how we expect to 1000x our own consumption. Dozens of agents today, thousands tomorrow. Half of them running in physical systems, devices, and robotics that haven’t shipped yet. The data center buildout of the last two years was sized for training — a finite, episodic workload. Inference is continuous and compounding, and the real buildout hasn’t started. The investable question shifts from who trains the models to who serves the tokens.
Space and power are gold. Everyone’s watching Nvidia allocation, but we think the actual bottleneck is two layers upstream. Neoclouds and hyperscalers are fighting for places to deploy clusters. Shells go up in eleven months, clusters come online in twenty-one days, GPUs arrive if you pay — but substations take five years and new generation takes ten. The opportunity is colocating next-gen clusters alongside existing twenty-five to fifty megawatt sites with grid interconnect, and locking energy under fifteen-year PPAs before anyone else does.
Sovereign inference. Governments are treating compute like a strategic reserve. In the last two weeks: G42 launched a framework for sovereign AI, Stargate UAE broke ground on a 1-gigawatt OpenAI/Oracle campus, and HUMAIN committed to multi-exaflop capacity with AMD. We think there’s opportunity for new regional neoclouds with local licenses and government relationships — serving sovereign-adjacent customers the big four can’t touch.
Inference silicon is its own market. Inference is projected to be two-thirds of AI compute spending this year, and Nvidia’s training-era architecture isn’t the right answer for serving. That’s why Nvidia paid $20B for Groq and OpenAI just committed $20B to Cerebras, which filed to IPO at $35B last week. The frontier LLM inference chips are already captured. We think the venture opportunity is the next wedge — novel architectures for workloads Nvidia never designed for: transformer-specific ASICs, analog and photonic compute, modality-specific silicon for video, audio, and robotics.
Wall Street is mispricing GPU depreciation. The bears say hyperscalers are overstating profits by $176B through 2028 because GPUs only last three years. We think the data says otherwise. H100 spot prices dipped after launch, then climbed above launch prices as workloads pulled demand forward. The real pattern is a value cascade — training in years one and two, inference serving in years three through six. The venture opportunity is the long tail: networks that turn aging enterprise GPUs into productive inference capacity.
We’ll share another edition next week.
