20VC: Benchmark's Sarah Tavel on Are Foundation Models Commoditising | Why Frontier Models Will Be Closed Source | Why the Value is in the Application Layer | The Future of AI is "Selling the Work" Not the Tools

20VC · Harry Stebbings — Sarah Tavel · May 6, 2024 · Original

Most important take away

The next wave of AI value is shifting from selling software per seat to “selling the work” — packaging the full outcome an employee would produce, which expands market size by 10-50x because you price against headcount cost rather than productivity gains. Frontier foundation models are likely to remain closed source and oligopolistic given exploding training costs, but the dominant value capture happens in the application layer where startups own the end user, build deep workflow around models, and can absorb model improvements over time.

Summary

Actionable insights and tech patterns from the conversation:

Career and investing wisdom

“Why now” is the single most important question for any founder or investor. A real why-now (technology catalyst, regulatory shift, new behavior) acts like a strong current that pushes a company forward and lets it survive the new mistakes every hyperscaling company must make. Without it, you are paddling against the tide. Authenticity-style why-nows (e.g., BeReal) can be real but get crushed by stronger countervailing currents like short-form algorithmic video.
The best VCs combine hyper-curiosity with hyper-competitiveness (Peter Fenton model). Continuous learning across adjacent fields and high-EQ understanding of what motivates people are the differentiators.
For founders evaluating investors: a great partner compounds outcomes by 5% per decision; over years, that compounds materially. Optimize for trust and vulnerability with a board member, not platform services or cheerleading.
Pricing discipline rule from Tavel’s mentor: “If you’re ever happy to take less, don’t do the deal at all.” Don’t use price to talk yourself into conviction — use it as a litmus test of conviction.
On reserves: Benchmark essentially does not reserve. Tavel pushes back on the “pro rata” expectation, calling out “you’re pro rata” — investors demanding follow-on rights without doing follow-on work create unnecessary dilution and conflict for founders.

The AI thesis — sustaining vs. disruptive

AI is currently a sustaining technology for incumbents in existing employee workflows (Notion, Adobe just bolt on an OpenAI API and capture the value). Startups attacking the same productivity-improvement frame lose to incumbent distribution.
The disruptive opportunity is “selling the work” not the tool: deliver the finished outcome an employee would produce, priced against headcount cost. This unlocks 10-50x larger TAM and avoids the per-seat adoption friction.
Use cases automatable today: HR ops, recruiting, sales, translation (deep nichelets unbundling specific employee work products). Where models fall short, bridge with a human-in-the-loop — preferably the AI vendor’s own employee doing QA, not the customer’s.
Counter to the “infra is where value lives” trope: Tavel believes the application layer captures most of the value because owning the end user lets you compound value capture over time.

How to evaluate AI startups

Distribution of value test: of the 100% of value the product provides, how much comes from the underlying foundation model vs. workflow the startup built? Wave-1 wrappers were ~90% OpenAI / 10% startup — terrible defensibility. The new wave builds far more workflow, integration, and proprietary glue.
The Brad Gerstner / Sam Altman question — “Are you excited by a 100x improvement in OpenAI?” If yes, you have a sustaining play. If you’d be steamrolled, you don’t.
Durable value-prop test: from first principles, is the value proposition enduring (e.g., DeepL’s instant human-quality translation)? Then look at early cohort depth-of-usage and the Sean Ellis “how disappointed would you be if this disappeared” signal.
Differentiation is the hardest current question — most AI customer service / sales agent categories have many strong teams. The tiebreaker is the founder: competitive energy, urgency, ambition.

Infrastructure / model landscape

Each successive frontier model is roughly 10x more expensive to train; compute, specialized chips, and power are the binding constraints. Expect an oligopoly of closed-source frontier models. Open source may suffice for non-frontier use cases; Meta’s Llama is the wildcard that could reshape this.
End-customer inference prices keep falling as competition intensifies — application-layer startups benefit.

Network effects and moats in AI

B2B software has rarely had true network effects; it has had economies of scale (win a use case, expand features, raise prices, positive NRR, more efficient GTM). AI doesn’t change this fundamentally for B2B.
Capital itself becomes a moat in some segments (e.g., Cognition raising massive rounds to buy GPUs and train models) — accept upfront dilution only if you believe the capital creates a durable moat in a very large market.

Benchmark’s operating model

Equal partnership, 1-2 new investments per GP per year, no internal consultant team — the partner does the recruiting calls, candidate closes, and weekly work themselves. The product is undelegated partnership, not a platform.
Wins are more often from selection bias (founders who want a vulnerable, board-engaged partner) than from out-competing on terms. Almost never break the model on board seats; sometimes lose on price/round size.
Pull the future into the present at every board meeting: what does the enduring, independent version of this company look like, and what decisions today get us there?

Personal note

Biggest miss: not acting on the early smart-contracts / blockchain insight despite recognizing its disruptive potential. Lesson: when you have a genuine moment of insight that something is fundamentally disruptive, you must act on it.

Chapter Summaries

Joining Benchmark — How Peter Fenton and Rich Barton recruited Tavel from Greylock; what makes Benchmark’s small, equal, undelegated partnership model “impossible to unsee.”
The Peter Fenton effect — Relentless learner, high-EQ closer of candidates; the best VCs are hyper-curious plus hyper-competitive.
The “why now” lesson — The most important and most underrated question in venture; strong why-nows act as currents that carry companies through inevitable mistakes. BeReal case study on insufficient why-now vs. TikTok’s minutes black hole.
AI: sustaining or disruptive — Sustaining for incumbents in existing workflows; disruptive when startups “sell the work” instead of per-seat software.
Selling the work in practice — HR ops, recruiting, sales agents, DeepL; human-in-the-loop bridges current model limitations; mid-market and SMB adopting fastest.
Application layer vs. infrastructure layer — Why Tavel believes the app layer captures most value; the “what % of value comes from the model” framework.
Differentiation in a crowded field — Many strong teams per category; founder quality (competitive energy, urgency) is the deciding factor.
Pricing and dilution — Adjusting mental models for bigger TAMs that justify higher entry prices; managing FOMO; capital as a moat for compute-heavy plays.
Network effects and B2B moats — AI doesn’t change the B2B playbook of economies of scale, feature expansion, and positive NRR.
Single-modality vs. multi-modal models — Focused specialists (ElevenLabs, DeepL) win in their modality because frontier labs prioritize the core LLM frontier.
Frontier model economics — 10x training cost per generation, compute/power constrained, oligopoly of closed-source frontier models likely; Llama as wildcard.
How Benchmark wins — Undelegated partnership as a product; equal economics drive group effort; selection bias attracts founders who want a true partner.
Boards and being a “cheerleader” — Reject cheerleading; ask the hard, future-pulling questions; 5% better decisions compound.
Losing deals and breaking the model — Almost never break on board seats; sometimes lose on price; price is a litmus test of conviction.
Reserves and pro rata — Benchmark essentially doesn’t reserve; “you’re pro rata” critique of investors demanding follow-on without doing follow-on work.
Quick fire — Longest-tenure board (Chainalysis) as the best teacher; biggest miss was smart contracts; concerns about antisemitism and U.S. politics; parenting changes opportunity cost, not investing process.