20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

20VC · Harry Stebbings — Zico Colter · September 4, 2024 · Original

Most important take away

The most pressing AI safety issue right now is that models cannot reliably follow specifications - jailbreaks and prompt injection act like an unpatched “buffer overflow” in every LLM, which becomes a force multiplier on every other downstream risk (cyberattacks, misinformation, agentic failures in critical infrastructure). Architectures are largely irrelevant (“we are post-architecture”); data and the capabilities models produce are what matter, and we are nowhere close to data limits when you include multimodal and private data.

Summary

Actionable insights and patterns:

Career/work pattern: Zico uses the largest available frontier models almost exclusively for daily work (especially coding and lecture transcription) because generality wins until a task is repetitive enough to justify a specialized small model. Default to the biggest model, then specialize only after the task is well-understood and high-volume.
Stop training your own LLM from scratch. Zico argues this is no longer economically viable for most companies given strong open-weight options; the “we’ll train our own” default will fade.
Data is not the bottleneck. Public text (~30 TB used) is a tiny fraction of available data. Multimodal (video, audio, sensor) and private/enterprise data reserves are massive; the real constraint is compute and our ability to extract information from existing data (current models do not maximally extract info even from data they have).
Pattern: RAG remains durable even after fine-tuning matures, because it respects existing data access controls. Most enterprise hesitancy about “training on our data” is a misunderstanding - API calls do not retrain the model. Engineers selling AI to enterprises should lead with this clarification.
Architectures don’t matter much anymore. Transformers are not magic; comparable results are likely achievable with many architectures given enough compute. Don’t over-index your career or research bets on “what comes after the transformer” - focus on data and downstream capabilities.
Data curation is overrated relative to the old paradigm. Unsupervised ingestion of raw internet data beats heavy manual labeling for general capability.
Compute scaling has not plateaued; scaling laws still hold. The real question is cost/efficiency, not capability ceilings.
AGI is plausible within Zico’s lifetime (he gives ~14-50 years with wide uncertainty). Definition he uses: a system that can substitute for a close collaborator on a year-long project. Companies that win will redeploy workforces to steer AI, not fire workers to replace them 1:1.
Cybersecurity is a clear-and-present AI risk, more immediate than bio risk in his view. Models can already analyze code and find vulnerabilities; patching is harder than exploiting. If a model can find a vulnerability in any binary/JS/code, open-weight release becomes dangerous.
Open-weight stance: he’s pro open weights at current capability levels (Llama-3 ~405B is fine), but a capability threshold exists (e.g., universal exploit-finding) where open release would be reckless. The current pattern - closed models lead, open weights follow - is a healthy buffer.
Top AI safety problem: spec-following / jailbreak resistance. Until solved, every other downstream risk (bio, cyber, agent misuse) is multiplied. Treat LLMs in agentic pipelines as software with an unpatched buffer overflow when they parse untrusted third-party data.
Misinformation framing: the outcome isn’t that people believe everything - it’s that they believe nothing. AI accelerates the collapse of objective record; expect rising value of trusted media brands and existing-associate trust networks.
Regulation: downstream-use regulation (libel, fraud) with tweaks for velocity/volume is more tractable than regulating architectures. Most architecture-focused regulation will be obsolete within months.
China/US: less a race, more a need for global cooperation on safety standards specifically.
Correlated-failure risk (CrowdStrike-style) is the realistic catastrophic AI scenario, not rogue superintelligence. Be careful about embedding LLMs into critical infrastructure.

Chapter Summaries

Foundations and how LLMs work: Next-token prediction trained on internet text produces genuinely intelligent behavior - one of the most under-appreciated scientific discoveries of the past 20+ years.
Data limits: Public text is mostly consumed, but multimodal and private data reserves are enormous; compute and extraction efficiency are the real limits, not data.
Model scaling and small models: Bigger still wins on hard tasks; small specialized models come after generality is achieved for a specific repetitive task. Perceived plateau is mostly users asking the same easy questions.
Commoditization: Many open-source training runs were vanity projects; expect consolidation. Training your own LLM from scratch will become economically unjustifiable.
Compute: Scaling laws still hold; the conversation is shifting from capability ceilings to inference/training cost trade-offs.
AGI and corporate strategy: AGI (year-long collaborator equivalent) is plausible in his lifetime; winning companies redeploy workers to steer AI rather than replace them.
Enterprise adoption: On-prem hesitancy is partly misunderstanding about training vs. inference; RAG remains the right pattern for respecting access controls.
Misinformation and platforms: AI accelerates loss of shared objective reality; trust will reconcentrate in established institutions and personal networks.
Regulation: Regulate downstream uses, not architectures; existing libel/fraud frameworks plus volume/velocity adjustments cover much of the harm.
AI safety hierarchy: Spec-following failure (jailbreaks/prompt injection) is the top concern because it multiplies every other risk; cyber risk is the most immediate practical threat.
Open vs. closed models: Current open weights are safe to release; future capability thresholds (e.g., universal exploit-finding) should trigger caution. Closed-first, open-follows is a useful natural buffer.
Far-flung scenarios: Rogue AI is worth thinking about but most safety work should focus on practical concerns; correlated failures in critical infrastructure (CrowdStrike-style) are the realistic catastrophe.
Quick fire: Architectures don’t matter (“post-architecture”); data doesn’t need heavy curation; OpenAI board role focuses on AI/safety expertise; safety must be global, not a US-China race.