20VC: Why Foundation Model Performance is Not Diminishing But Models Are Commoditising, Why Nvidia Will Enter the Model Space and Models Will Enter the Chip Space & The Right Business Model for AI Software with David Luan, Co-Founder @ Adept

20VC · Harry Stebbings — David Luan · June 24, 2024 · Original

Most important take away

Foundation model performance is not actually plateauing — the easy gains from simply scaling base models are slowing, but a second, largely untapped axis (giving models access to simulated environments to generate their own training data via RL-style self-play) will absorb enormous compute and continue driving capability. The economic winners will be the 5–7 tier-one cloud-backed labs that vertically integrate models with chips, while the most valuable AI applications will be vertically integrated agent products that collapse the talent stack rather than charging “price per work.”

Summary

Actionable insights and tech patterns from David Luan (ex-VP Eng OpenAI, ex-co-lead Google Brain, co-founder Adept):

Career and team-building patterns

The AI org model has shifted from “bottom-up basic research” (Google Brain 2012–2018: hire brilliant researchers, no near-term goals, produce papers) to Apollo-style mission teams that pick one unsolved problem (robot hand, Dota, GPT scaling) and throw a large org at it. If you want to do leverage work in AI, join (or build) goal-directed teams, not paper-writing federations.
“Collapsing the talent stack” (credit Scott Belsky): the people who ship fastest are generalists who simultaneously play PM, designer, engineer, and GTM. AI co-pilots will accelerate this — bet your career on becoming a generalist who supervises specialist AI agents rather than a narrow specialist.
AI engineering feels like gardening, not programming — outputs are emergent and capabilities appear at scale thresholds you can’t predict in advance. Embrace experimentation and unknown unknowns rather than spec-driven building.

Tech patterns and model dynamics

Scaling law nuance: every incremental GPU has diminishing returns, but every doubling of compute yields predictably consistent gains (log curve vs straight line). Stop reading “diminishing returns” headlines literally.
The next axis of scaling is not bigger base models but RL-style self-play in simulated environments (e.g., give the model a theorem prover or Jupyter notebook, let it try, reflect, and generate its own positive/negative examples). This is how reasoning will be solved — and it must be solved at the model-provider layer, not by app developers.
Chatbots and agents are speciating into two different product categories. Hallucination is a feature for chatbots/creative tools and a bug for agents. Build accordingly.
Memory: short-term/context-window memory is largely a model-provider problem (e.g., Gemini’s million-token context). Long-term memory (user preferences, org knowledge) is an application-layer problem — app builders should own it.
Minimum viable capability thresholds: capabilities appear suddenly at certain scales (GPT-2 couldn’t do 3-digit arithmetic until it could). Plan product roadmaps assuming capability cliffs, not smooth curves.

Business model and market structure

Expect 5–7 long-term frontier model providers max. Every tier-one cloud accidentally must win at the model layer because models are becoming the base computing primitive (replacing EC2-style nodes). Whoever controls the model interface controls all downstream compute.
Strong vertical-integration pressure between chip makers and model builders in both directions: Nvidia will push into models; model builders (Google TPU, others) will push into chips for margin and cost-of-training advantage. Apple’s edge advantage is running small, private, on-device models for free — they will dominate the “small/private/edge” ring but not frontier reasoning.
Apple–OpenAI deal signals a commoditized future for LLMs as hot-swappable backends behind a controlled interface.
Independent pure-play model sellers (Anthropic, Mistral) either become first-party arms of a hyperscaler or must build a $5B+ FCF enterprise GTM flywheel fast.
Pricing: “selling the work” / consumption pricing is overhyped for knowledge work. Co-pilots/teammates that augment creativity will not be priced per output — they’ll be priced like teammates. “Price per work” is a corollary of “AI replaces all jobs,” which Luan doesn’t believe.
Agents vs RPA: RPA is yellow-line-on-the-factory-floor automation for high-volume, identical tasks. Agents are full-self-driving — they reevaluate at every step. RPA incumbents are structurally disrupted because the agent business model (end-user teaches by demonstration) destroys the implementation services revenue.
“Every enterprise workflow is an edge case” (Parag Agrawal). The only way to win in agents is vertical integration from UI down to model.
Enterprise AI adoption is still overwhelmingly experimental-budget. Refuse to sign experimental deals if you want quality revenue. We are overestimating short-term and underestimating long-term adoption.
AI services/implementation firms are temporarily large because they fill the gap between base models and enterprise use cases — but as those gaps get productized, product companies will be the long-term economic winners, not services firms.

Regulation and open source

Main near-term risk is regulatory capture by incumbents pulling up the ladder. Open source will always lag closed but is critical for keeping the broader field competitive.

Long-term vision

Agents in five years will feel like the GUI-to-agent leap (analogous to DOS-to-GUI): users interact at the level of goals. The pre-mortem failure mode is walled-garden incumbents preventing cross-app agent reach.
Start AI product design from the human-computer interaction layer down, not model-up. The “make models smarter, then figure out HCI later” waterfall is wrong.

Chapter Summaries

Google Brain era and bottom-up research — Why 2012–2018 Google Brain was Bell Labs-era AI: curiosity-driven researchers, no near-term goals, transformer/diffusion/optimization breakthroughs all emerging from the same building.
The shift to mission-driven AI orgs at OpenAI — Post-transformer, the right structure became Apollo-style: huge teams aimed at a single unsolved problem (robot hand, Dota, GPT scaling).
ChatGPT as a slow-boiled frog — GPT-2 (2019) and GPT-3 API were already capable; ChatGPT required both minimum viable intelligence and consumer packaging to go viral.
Compute and diminishing returns — Per-GPU returns diminish, but per-doubling returns are predictable. Base-model scaling is hitting cost walls at $1B–$4B training runs.
The second scaling axis: self-play / synthetic data — Models generating their own training data in simulated environments (theorem provers, Jupyter) will absorb massive compute and unlock reasoning.
Chatbots vs agents as different species — Hallucination is a feature for chatbots, a bug for agents. They will diverge into distinct product categories.
Memory — Short-term context belongs to model providers; long-term user/preference memory belongs to application builders.
Minimum viable capability thresholds — Capabilities emerge suddenly at scale (e.g., 3-digit arithmetic in GPT-2). Roadmaps must account for capability cliffs.
Reasoning as the next big unlock — Solved by base models + simulated environments + human feedback. Must happen at the model-provider layer.
Five-to-seven model providers and vertical integration — Tier-one clouds must win; Nvidia pushes into models, model builders push into chips, Apple dominates the edge.
Independent foundation labs — Either get absorbed by hyperscalers or build a $5B FCF enterprise flywheel.
Adept’s positioning — Vertically integrated agent for knowledge work: own UI through model so users can teach agents new workflows.
Agents vs RPA — RPA = painted yellow line; agents = full self-driving. RPA business model is structurally disrupted.
Pricing models — Price-per-work assumes commoditized repetitive labor; co-pilots/teammates that augment creativity won’t be priced that way.
Org structure of the future — “Collapsing the talent stack”: generalist humans supervising AI specialists.
Enterprise adoption reality — Still mostly experimental budget; long curve ahead. Avoid experimental-budget deals.
Services vs products in AI — Services firms are temporary gap-fillers; productized capabilities will be the long-term winners.
Regulation and open source — Real risk is regulatory capture; open source must persist to keep the field competitive.
AGI, safety, and HCI as the missing ingredient — Design from the human-AI interface down, not model-up.
Quick-fire — Agents and chatbots will speciate; agents in five years are a non-invasive BCI-like leap from GUI; failure mode is walled-garden lock-in.