AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)
Most important take away
The 2026 thesis is that “coding agents are breaking containment” — what coding agents did in 2025, they will now do for everything else, because software eats the world and coding agents eat software. The agent stack (LLMs + tools + filesystem + skills) has stabilized enough that builders should stop hedging on infra and instead invest in being a “capability explorer” living at the spending edge of the frontier models, where the next valuable workflows are discovered first.
Summary
Actionable insights
- Become a capability explorer, not an efficiency seeker. Right now the market rewards spending more tokens, not less. A $200/month Claude Code or Codex Pro subscription is a cheap option on discovering the next breakout workflow before competitors do. People who refuse to participate because “it’s slop” will lose 2026 to people who “bend it the right way.”
- Treat agents as your primary customer. 60% of traffic to Vercel’s admin/config app is now bots. Rule of thumb: if your product doesn’t exist as an API a coding agent can call (CLI, MCP, etc.), it doesn’t exist. The discipline is the same as good developer experience: clean docs, stateless APIs, progressive disclosure, search.
- Get into the short list of model recommendations. When users ask an LLM “give me an X provider” with no context, it returns ~3 names — Resend wins ~70% of email-provider recommendations from Claude. Optimize for being one of those 3 via “combo” content (“use X with Vercel and …”) that gets ingested into training corpora. Frequency-of-mentions is the current AEO/GEO signal; memory/personalization will replace it in 3–4 years.
- Adopt the “agent lab playbook.” Bootstrap on frontier models, accumulate high-quality user workload data, then distill/RL into your own domain-specific model for cost, latency, and a marketing halo. Tooling like Thinking Machines’ Tinker is making this materially easier.
- Reconsider RL post-training even if it depreciates in 3 months. App companies already throw out work every 3 months as models improve — RL’d quality gains are just another version of that. Keep the data, throw out the weights, re-run later. Look into long-trajectory training, synthetic rubrics, and Dr. GRPO — RL is going much more multi-turn (hundreds of turns) than people realize.
- Pre-stage for the post-2T parameter / >10T parameter model era. Larger clusters are coming online over the next 3–5 years. Expect rationing now; don’t architect around it long-term. Memory and context length, not parameters, will be the binding constraint (Gemini has had 1M context for two years and almost nobody uses it).
- Push on “zero human review” as the next coding frontier. Five months ago, “zero human-written code” was crazy; now it’s normal. The next step (which OpenAI is exploring) is checking in unreviewed agent-generated code, which forces investment in tests and automated verification — things you should have been doing anyway. This unlocks “dark factory” software production at much higher quantity, which then enables quality through volume.
- For SaaS buyers: start replacing low-NPS SaaS with custom builds, but bring your team along. Swyx’s own company pays $200K/yr for an event-management SaaS that he estimates he could rebuild for $2K. The blocker is org adoption, not feasibility. The widening “AI psychosis gap” inside companies is a real risk in both directions (laggards underestimate disruption; vibe-coders ship 80% solutions that dump cleanup on teammates).
- Bet on alternative inference hardware. Cerebras, Groq, and others are delivering thousands of tokens/sec vs. <100 on Nvidia. Every 10x speedup unlocks new product patterns. Cognition runs on Cerebras; OpenAI also uses it. Don’t dismiss the multi-year investment cycle in non-Nvidia silicon.
Career advice
- You are always behind in AI — that’s the job. Both hosts say everyone in AI feels behind; nobody has caught up. Stop using that as an excuse and start shipping experiments at the edge of the frontier.
- Worst case at a startup is acquihire into a frontier lab. Swyx’s view: even if your AI startup fails commercially, executing competently is essentially a job interview into Anthropic/OpenAI/etc. The downside floor is high, so the EV of starting something is good — especially for very small teams. Mid-sized startups (especially in LLMOps/observability/inference) face real consolidation risk (e.g., Langfuse → Clickhouse).
- For builders: optimize for being a “capability explorer.” The people who discover the next billion-dollar workflow are the ones spending $10K/day on tokens (e.g., Ryan Lopopolo at Imbue) and trying things that don’t yet work. If you only do things that work, you’ll never find the next thing.
- For PMs / engineers: invest in agent experience (AX), which is just good DX. Markdown skills files + scripts is now the consensus minimal format for agent integrations. Get good at writing them.
Investments and stocks mentioned (notable)
- Anthropic — Claude Code is ~$2.5B ARR after one year; one of “the most valuable companies in the world” thanks to coding hitting at the right time. Bundled/restricted-access strategy.
- OpenAI — Estimated ~$2B ARR on Codex; reportedly valued ~$1.2T. Pursuing a “super app” strategy that may distract from coding focus.
- Cursor — Rumored ~$2B ARR.
- xAI — Mentioned as a major cap player (“don’t underestimate them”).
- Cognition — Swyx is involved; training their own models on Cerebras inference.
- Cerebras, Groq, Tenstorrent, AMD (MI series) — Alternative inference silicon getting bid up; cited as the rare non-Nvidia winners. Swyx is bullish on multi-year custom-chip cycle.
- Nvidia — Tries harder on Neoclouds (e.g., CoreWeave) than on Neolabs; harder to “will” a new lab into existence than a new cloud.
- Vercel — Cited as an example of agent-first traffic (60% bots).
- Resend — Example of a young (2023) company winning the LLM-recommendation lottery (~70% of Claude email-provider recs).
- Supabase, Convex, Postgres, MongoDB — Discussion of an “AI-native system of record” gap; Convex mentioned as a candidate.
- Fireworks, Together — “Crushing it” as the open-model inference and fine-tuning derivatives market grows.
- LangChain (Harrison Chase) — Cited as the canonical reinvention case (LangChain → LangGraph → LangSmith → Agents).
- Microsoft / GitHub — Sleeping giant in coding; if they “wake up” beyond Copilot it’s the most likely market structure changer.
- Meta (MSL), Zhipu (GLM) — Both pushing into the coding agent space; status quo is two-player (Anthropic + OpenAI) but watch for shifts.
- Sierra (Bret Taylor), Decagon (Max Junestrand) — Cited as application-layer companies that survive by being willing to throw out work every 3 months.
- Bridge, Decagon (legora) — Vertical “outsourced AI team” plays; durable across AI trend shifts.
Key thesis to internalize
- 2025 = year of coding agents. 2026 = year coding agents break containment to do everything else. Software eats the world; coding agents eat software; therefore coding agents eat the world.
- The two underexplored frontiers Swyx is watching: memory/personalization and world models (read Fei-Fei Li’s spatial intelligence essay — she may not have the answer but has the right problem statement).
Chapter Summaries
1. AIE Europe takeaways and the current zeitgeist
Swyx returns from AI Engineer Europe in London. Top conference topics: OpenAI Codex / Claude Code, harness engineering, and context engineering, with evergreen tracks on evals, observability, GPUs, and multimodal/generative media.
2. Has agent infrastructure stabilized?
Discussion of Harrison Chase’s claim that AI infra has finally stabilized. Swyx agrees the agent stack (LLMs + tools-in-a-loop + filesystem + skills as markdown+scripts) has converged, but argues you shouldn’t stake your reputation on the thesis — just keep adapting. Memory, sub-agents, and real-time remain in flux.
3. Vertical vs. horizontal AI startups; in-house AI teams
Vertical “outsourced AI team” companies (Bridge, Decagon-style) are robust because customers won’t hire in-house. Horizontal infra plays mostly succeed when they’re cloud reinventions (sandboxes = compute). The “agent lab playbook”: bootstrap on frontier models, then specialize/distill once you have workload and data.
4. Custom inference silicon
Cerebras, Groq, and others delivering 1000s of tokens/sec change product design. Cognition runs on Cerebras; OpenAI also uses it. Multi-year cycle — don’t dismiss.
5. Selling to agents (AEO/GEO)
60% of Vercel admin traffic is bots. Build for agents = build for developers (good docs, stateless APIs, discoverability). Resend wins 70% of Claude email-provider recommendations. Optimize for being one of ~3 names in model recommendation lists; semantic-association combo content matters. Memory/personalization will eventually displace mention-frequency as the ranking signal.
6. The state of the AI coding wars
Claude Code ~$2.5B ARR, Codex ~$2B, Cursor ~$2B — markets created in a single year. Token-maxing leaderboards are the current employer-signaling game. Anthropic uses scarcity/premium positioning; OpenAI uses subsidized open access. Status quo of two big players + long tail seems stable barring a Microsoft/GitHub awakening or Chinese lab breakthrough (Zhipu GLM, Meta MSL).
7. Stickiness, first-mover advantage, and “consumer AI plateau”
ChatGPT’s first-mover stickiness held in consumer AI but the category itself has plateaued in DAU growth. Claude Code’s similar stickiness in coding is a positive signal that being first to a magical experience matters more than expected — which is also bullish for whoever ships the next breakout product (e.g., Operator/computer-use hybrids).
8. Foundation models eating startups; valuations
Swyx isn’t worried about small startups (worst case = acquihire into a lab) but is worried about mid-sized LLMOps/inference players (Langfuse → Clickhouse). Valuations of frontier labs (OpenAI ~$1.2T, etc.) defy traditional venture banding because every 3 months is potentially existential up or down.
9. SaaS vs. AI-native rebuilds
Swyx pays $200K/yr for event-management SaaS he could rebuild for $2K, but team adoption is the blocker. The “AI psychosis gap” inside companies is a real risk both ways. Opportunity for an AI-native system of record (Convex-adjacent) before falling back to Postgres/Mongo.
10. Bio-safety vs. security; private models
Anecdote about an Anthropic dinner — Swyx flagged bio-safety; the CISO flagged security. Anthropic’s marketing of Claude as “private” is overstated when it’s deployed across 40 enterprises with 10K employees each.
11. Model size, scaling, and rationing
We’re entering the >10T parameter era. Swyx’s theory: Google Gemini Ultra exists but is held back as a teacher model for distillation. Context length is the slowest-scaling axis (4K to 1M took ~3 years). Memory will be the binding constraint going forward.
12. What Swyx changed his mind on
Was bearish on open models (5% market share, declining per Ankur Goyal); now thinks open models are growing. Top-20% AI builders increasingly use open models, fine-tuners (Fireworks, Together) are crushing it, and RL post-training is worth doing even if depreciated in 3 months.
13. Dark factories and zero human review
“Dark factory” coding: zero human-written code is now normal; zero human review is the next frontier (OpenAI exploring). Forces investment in testing/verification. RL going multi-turn (hundreds of turns) — search terms: long trajectory, synthetic rubrics, Dr. GRPO.
14. The next frontiers: memory and world models
Memory + personalization is one frontier. World models — beyond robotics and 3D walkthroughs — are about giving AI a real conception of physics and matter. Swyx invokes the Good Will Hunting “you’ve never been to the Sistine Chapel” scene as the gap between book-smart LLMs and grounded intelligence. Read Fei-Fei Li’s spatial intelligence essay.