← All summaries

Sergey Levine - Building LLMs for the Physical World - Invest Like the Best, EP.465

Invest Like the Best · Patrick O'Shaughnessy — Sergey Levine · March 31, 2026 · Original

Most Important Takeaway

Physical Intelligence is building robotic foundation models analogous to LLMs — general-purpose AI that can control any robot body to do any task, rather than narrow specialists for individual applications. The key insight driving the company is that generality actually makes the problem easier, not harder, because a model trained on diverse data develops genuine physical understanding that transfers to new tasks with minimal additional data. This mirrors how language models surpassed domain-specific NLP tools by leveraging broad web-scale training, and Levine believes we are approaching the activation point where robots become useful enough to deploy in the real world and begin collecting their own improvement data at scale.

Summary

Actionable insights and key information about specific companies and careers:

  • Physical Intelligence (where Patrick O’Shaughnessy is an investor) is developing vision-language-action models — LLMs adapted for robotic control. They train first on text, then web images, then diverse robot data. Their models already handle dexterous manipulation, work across different robot embodiments without architecture changes, and completed nearly all tasks from the “Robot Olympics” challenge using their standard task onboarding process.

  • The generality thesis is working. Physical Intelligence found that their models generalize to new robot types (multi-fingered hands, different arm configurations) without being explicitly told what robot they are controlling. This was surprising even to the researchers and validates the approach of building one general model rather than task-specific systems.

  • Chain-of-thought reasoning gives robots common sense. By having the robot “think” before acting (e.g., looking at a messy kitchen and deciding “I should pick up the plate first”), the system leverages web-scale pre-training knowledge for handling edge cases. This is a concrete technique companies building on robotic foundation models should understand.

  • A major recent discovery: robots can improve through verbal coaching alone. Physical Intelligence found that labeling robot experience data with semantic commands (language descriptions) — without adding any new teleoperation demonstrations — improves generalization. This means the bottleneck has shifted from low-level physical capability to mid-level scene interpretation, and someone can literally talk to the robot to make it better.

  • Reinforcement learning enables superhuman speed. By having robots practice tasks and removing human pauses from demonstrations, Physical Intelligence achieved significantly faster task completion (e.g., plugging in cables) than human teleoperation. This combines the generative AI approach (reproducing human capability) with deep RL (exceeding human performance).

  • Robotics hardware costs have dropped dramatically. A PR2 robot cost ~$400K a decade ago; UC Berkeley lab robots were ~$30K; Physical Intelligence’s robot arms are roughly $3K each. This cost reduction, combined with foundation models that compensate for imprecise hardware, opens the door for widespread experimentation.

  • For companies preparing for robotics adoption: Don’t assume you know what data to collect. The type of data needed (teleoperation demonstrations vs. autonomous experience vs. language-labeled coaching) depends on which technical approach wins, and that is still uncertain. The coding tools analogy is the best template — expect robots to augment human workers and increase productivity rather than replace them outright.

  • Career insight from Levine’s path: Key career moments came from people betting on his potential rather than his track record — a Nvidia internship as a sophomore, a robotics postdoc despite having zero robotics experience (he came from computer graphics), and Jeff Dean greenlighting his “ARM farm” experiment at Google when he was a junior researcher. Organizations that empower individual researchers to experiment with pet projects (like OpenAI with ChatGPT originating as John Schulman’s side experiment) produce outsized results.

  • The biggest technical risk is handling the breadth of unexpected real-world situations. Controlled environments like hotel rooms and restaurant kitchens are tractable; open-ended home environments where anything can happen remain the hardest challenge. Tasks involving interaction with people (elderly care, childcare) will likely be the very last to be solved.

  • Boston Dynamics is praised for repeatedly showing what people thought was impossible, even with caveats. The new Atlas robot is highlighted for its interesting design choices around joint range of motion. However, the gap between impressive demos and useful products remains a legitimate question for the entire robotics industry.

  • The “bitter lesson” remains controversial in robotics. Many researchers still believe you should encode physics knowledge and engineering principles into robots rather than learning everything from data. Levine argues that for true generality — especially generality in how the system can improve — end-to-end learning from data is essential.

  • Timeline outlook: Levine is optimistic relative to established robotics researchers but pessimistic relative to robotics entrepreneurs. The key uncertainty is the bootstrap challenge: getting robots to a level of usefulness where they can be deployed and begin collecting real-world data at scale. The interaction between technology readiness and public comfort (similar to autonomous vehicles) will also shape timelines.

Chapter Summaries

What is Physical Intelligence and Why Generality?

Physical Intelligence aims to build robotic foundation models that control any body to do any task. The thesis is that full generality is easier than narrow specialization because diverse data builds genuine physical understanding, mirroring how LLMs surpassed domain-specific NLP systems.

The Scarecrow Problem and the Demo Paradox

Robotics has a “scarecrow problem” — amazing physical devices need a brain. The hardest part of building general systems is that generalization doesn’t make for exciting demos; the most impressive demos come from perfectly controlled narrow settings, while true generalization means doing mundane tasks reliably in any environment.

What Success Looks Like: A Cambrian Explosion

Success means enabling a Cambrian explosion of robotics applications, similar to what personal computers did for software. A general foundation model would let anyone experiment with novel robot designs and applications without solving the intelligence problem from scratch.

Humanoids and Form Factor Diversity

Humanoids are one valid form factor among many. The intelligence challenge is the same across all robot types. Future robots might include swarms of quadcopters building houses, surgical micro-robots, or ceiling-mounted arms — the foundation model should be agnostic to embodiment.

The History of Robotic Learning

End-to-end robotic learning dates to the 1980s (the ALVIN autonomous driving system). Key milestones include deep reinforcement learning in the early 2010s and the recent advent of multimodal LLMs that can provide common sense knowledge for robotic control. The challenge has always been building systems that are cost-effective to train, handle edge cases, and remain robust and fast.

Combining Generative AI with Reinforcement Learning

The two great achievements of modern AI — generative models (reproducing human-level output) and deep RL (surpassing human performance, as in AlphaGo) — need to be combined. Physical Intelligence is pursuing both: chain-of-thought reasoning for common sense and reinforcement learning for continuous improvement through practice.

Vision-Language-Action Models

Physical Intelligence’s core technology is a vision-language-action model: an LLM adapted for robotic control through sequential training on text, web images, and diverse robot data. Chain-of-thought reasoning unlocks web-scale knowledge for edge cases, while RL enables practice-based improvement (demonstrated with espresso-making).

Data, Sensors, and the Bootstrap Challenge

The robot platform uses just three cameras and no touch or force sensors. The key insight is that good learning compensates for sparse sensing. On data needs, Levine argues that quantifying the exact data requirement is less important than getting systems useful enough to deploy and self-collect data, similar to Tesla’s approach.

Surprising Progress on Dexterity and Embodiment Transfer

The most surprising finding has been how well general models handle dexterous tasks and transfer across different robot types without architecture changes or explicit embodiment information. The system also passed nearly all Robot Olympics challenges using its standard onboarding process.

Moravec’s Paradox and the Shifting Difficulty Landscape

Things easy for humans (picking up cups) are hard for robots, and vice versa. Machine learning is changing this equation: tasks where data collection is straightforward are becoming easy regardless of physical intricacy. The remaining hard problems require common sense, multi-level reasoning, and connecting physical skills to web knowledge.

Real vs. Simulated Data Controversy

A major open question is the split between real-world data and simulation. Humanoid locomotion succeeds with heavy simulation and zero real data, while robotic manipulation succeeds with large real-world datasets and foundation models. Whether these approaches converge or one wins out is unknown.

The Robot Olympics and Cool vs. Useful

Physical Intelligence solved nearly all everyday tasks from the Robot Olympics challenge (opening doors, washing greasy pans, using plastic bags) using their standard task onboarding process, validating the power of generality. Their strategy: subject to being useful, make it as cool as possible.

Superhuman Robot Capabilities

Robots can exceed human speed by removing processing pauses from demonstrations. More broadly, robots can be made very large or very small, potentially enabling applications like microsurgery that humans physically cannot perform.

Research Culture and What Makes Great Researchers

Great researchers have no single personality type but share passion from diverse sources. The critical skill is knowing when to persist on a problem versus pivoting — an instinct that some researchers develop. Organizations that empower individual experimentation (like early OpenAI or Google Brain) produce outsized breakthroughs.

Preparing for the Robotics Future

Companies should not make assumptions about data collection strategies, as the optimal approach depends on unresolved technical questions. The coding tools analogy is the best model: expect augmentation of human workers, not replacement. The dance between human and robot productivity will evolve similarly to how coding agents evolved from code completion.

What Remains Most Uncertain

Timeline uncertainty is the biggest unknown, driven by the bootstrap challenge of reaching deployment-level usefulness. The representation of internal reasoning for embodied systems may differ fundamentally from LLM text representations, and figuring out the right structure for spatial and semantic thinking is a current research priority.