Diffusion Wins

Low-latency models, full-duplex voice, and open weights are pushing the moat up-stack

Feb 27, 2026

Each week, we share a small collection of ideas that shaped our internal thinking. Inspired by experiments like USV’s Librarian, this series is powered by an AI assistant that helps synthesize recurring themes from our discussions, alongside our own reflections.

Inception AI launched the fastest LLM in production. This matters because if diffusion can reliably deliver low-latency generation at scale, it changes the unit economics of agentic workloads where speed and cost compound across many calls. We’ve written before about how diffusion is a sleeper architecture for language because latency advantages will help make real-time, always-on agents finally practical.

Stefano Ermon@StefanoErmon

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting

4:57 PM · Feb 24, 2026 · 884K Views

305 Replies · 552 Reposts · 4.01K Likes

NVIDIA released PersonaPlex-7B, an open full-duplex voice model that can listen and speak at the same time. This matters because most “voice agents” still feel like walkie-talkies; full-duplex is what makes conversation feel natural. Our take: voice becomes the default UI once latency and interruption handling are solved, and open models like this speed up that transition.

Narrative violation: coding jobs are rising, not falling. Software engineering postings have surged 15%+ since late 2025, despite AI coding tools proliferating. AI isn’t replacing programmers—it’s creating “vibe coders” who ship functional code through AI assistance. Our take: the bottleneck shifts from writing code to reviewing it, creating huge demand for AI tools that manage AI-generated contributions.

David Sacks@DavidSacks

Narrative Violation: “Job Postings For Software Engineers Are Rapidly Rising”

6:25 PM · Feb 26, 2026 · 578K Views

319 Replies · 250 Reposts · 2.96K Likes

Google built MapTrace to teach world models “spatial grammar”. They generated 2M synthetic map-path pairs with an automated creator-critic pipeline. This matters because navigation and robotics are bottlenecked less by language and vision, and more by structured spatial supervision. Our take: synthetic data factories for physical reasoning will be a moat, because they turn limited supervision into a scalable input.

Google Research@GoogleResearch

A critical gap in modern AI isn't language or vision. It's spatial grammar. And it reveals a fundamental data bottleneck. We built MapTrace, a fully automated, generative AI pipeline (models act as creators/critics) to generate 2M high-quality map-path pairs. The result:

Example Paths generated by the proposed pipeline. We observed that the generated images tend to render text incorrectly however we mostly focus on path qualities in this work. We believe that with improvements in image generation models, these artifacts can be easily suppressed in future work.

9:44 PM · Feb 17, 2026 · 131K Views

32 Replies · 215 Reposts · 2.03K Likes

Alibaba’s latest open-source LLM beats o3, Sonnet 4, Grok 4, and DeepSeek’s 656B. This matters because near-frontier capability is now downloadable under permissive licensing, shifting advantage from model access to product integration and distribution. Our take: open weights commoditize the base model layer, and the moat moves up-stack to great apps, workflows, and feedback loops.

davidad 🎇@davidad

Today you can download a 27B-parameter LLM that is generally smarter than o3, Sonnet 4, Grok 4, or DeepSeek’s 685B. On the one hand, this is terrifying. On the other hand, there’s never been a better time to do some artificial neuroscience and figure out how these things tick!

5:37 PM · Feb 26, 2026 · 27.9K Views

19 Replies · 45 Reposts · 518 Likes

We’ll share another edition next week.

Canonical

Discussion about this post

Ready for more?