Tobi Lütke Made a 20-Year-Old Codebase 53% Faster Overnight. Here's How.

AI News & Strategy Daily · Nate B Jones · March 25, 2026 · Original

Most important take away

There are at least four distinct “species” of AI agents — coding harnesses, dark factories, auto research, and orchestration frameworks — and using the wrong type for your problem is one of the most common and costly mistakes teams make. Understanding whether your problem is “software-shaped” or “metric-shaped,” and how much human involvement belongs in the middle of the process, is the key to choosing the right agent architecture.

Chapter Summaries

Introduction: Why “Agent” Is Too Vague

The term “agent” (LLM + tools + loop) is too simplistic. There are four distinct agent species, and confusing them leads to picking the wrong approach for your work.

Species 1: Coding Harnesses

The simplest agent type — a single LLM agent acting as a developer stand-in (e.g., Claude Code, Codex). Developers manage multiple single-threaded agents, decomposing work into well-defined tasks. At project scale, this evolves into multi-agent setups with planner and executor agents (as Cursor demonstrated), but simplicity remains key to scaling.

Species 2: Dark Factories

Fully autonomous pipelines where a spec goes in and tested software comes out with minimal human involvement in the middle. Humans focus on specification quality at the start and eval/review at the end. Best suited for teams with strong evals and high confidence in their agent pipelines, though most enterprises still have a human review code before production.

Species 3: Auto Research

Descended from classical ML, this species optimizes against a metric rather than producing software. Examples include Tobi Lutke optimizing Shopify’s Liquid framework for runtime performance and Andrej Karpathy’s auto-research package for LLM tuning. Applicable to any domain with a measurable rate (conversion rates, code performance, model weights).

Species 4: Orchestration Frameworks

Multiple specialized agents with defined roles handing off work to each other (e.g., LangGraph, CrewAI). Requires significant effort to manage handoffs, context, and prompts. Best justified at high scale (thousands or millions of tasks), not for small volumes.

Cheat Sheet: Choosing the Right Agent

A decision framework: use coding harnesses when your judgment is the quality gate, dark factories when you trust evals over human review, auto research when optimizing a metric, and orchestration when routing across specialized workflows.

Summary

Actionable insights for working with AI agents in 2026:

Match agent type to problem shape. Before building anything, determine if your problem is “software-shaped” (build a coding harness or dark factory) or “metric-shaped” (use auto research). If it involves routing work across specialized roles, consider orchestration. Using the wrong species wastes time and produces poor results.
Master decomposition as a career skill. The ability to break large, messy problems into well-defined, agent-sized tasks is now one of the most valuable developer skills. Good decomposition is what makes single-threaded coding harnesses effective and is the foundation of scaling to multi-agent project work.
Shift your mindset from human-centered to agent-centered engineering. Instead of asking “how do I speed up each developer with AI assistants,” ask “how do I make it easy for agents to do the work.” At project scale, framing work around agent capabilities (planner + executor patterns) unlocks more than simply giving each engineer multiple assistants.
Keep agent architectures simple. Cursor found that adding three levels of management to their multi-agent system made things worse. Simple configurations scale; complicated ones do not. A planner agent plus executor agents is often sufficient even for million-line codebases.
Invest in evals and specifications, not just agents. Dark factories only work if your evals are strong and your specifications are precise. The human value shifts to the beginning (clear intent, good specs, non-functional requirements) and the end (reviewing outputs, monitoring production). This is a career-relevant shift — specification writing and eval design are becoming core engineering competencies.
Be honest about scale before choosing orchestration. Orchestration frameworks require heavy upfront investment in prompt engineering, context management, and handoff design. Ask whether the volume of work (thousands or millions of tasks) justifies that investment. For smaller volumes, simpler approaches are better.
Hybrid approaches are practical and recommended. You do not have to go fully autonomous. Running a mostly-dark-factory pipeline with a human reviewing evals at the end captures most of the speed benefit while maintaining accountability — especially important in enterprise settings where AI-generated production incidents are a real concern.
Career advice: become a manager of agents. The developer role is shifting from individual contributor to managerial. Whether you are managing five single-threaded coding agents or overseeing a planner-executor system, the skill set is moving toward decomposition, specification, quality judgment, and understanding which agent architecture fits which problem.