SAP: Bringing the 'Operating System' of a Company into the AI Era with CTO Philipp Herzig

No Priors · Sarah Guo — Philipp Herzig (CTO of SAP) · April 23, 2026 · Original

Most important take away

SAP’s CTO argues that AI is a business-model transition, not just a tech transition: winners will be defined by outcomes delivered to customers, not by the underlying models. The hard part of enterprise AI is not building flashy POCs but handling scale (20,000 APIs, multi-country master data, security, verifiability) and bridging the structured/unstructured data divide. Developers and business users alike need to shift their mindset toward writing evals and describing boundary conditions so agents can be trusted to produce verifiable outcomes.

Chapter Summaries

What SAP actually does: SAP is the “operating system” of ~400,000 enterprise customers, running finance, HR, supply chain, manufacturing, sales, procurement end-to-end.
Why SAP endures: Founded in 1972 by ex-IBM engineers who saw that reimplementing finance systems per customer didn’t scale. The concept of “standard software” has survived mainframe, client/server, internet, mobile, and now AI because customers buy outcomes, not tech.
CTO priorities: All-in on AI. Three layers are being re-engineered: (1) UI becomes generative/proactive, (2) business processes blend structured and unstructured work via agents, (3) data layer unifies SAP + external data into one semantic model.
Biggest engineering challenge: Scale. POCs with 10 docs or 10 APIs are easy; production with 20,000 APIs, per-country master data, and enterprise security is the real problem.
Evals and the developer mindset shift: Agentic coding works because compilers and unit tests verify output. Enterprise AI needs similar evals — developers must describe boundary conditions (security, privacy, code quality) and verifiable outcomes. TDD is effectively coming back, but driven by AI rather than discipline.
Agent mining and the data flywheel: Capture “tribal knowledge” from human-in-the-loop clarifications; elevate good deviations to new standard operating procedures, flag bad ones as anomalies. Each trace feeds new evals.
Computer use vs tool calling: Herzig leans toward tool calling / APIs as the dominant pattern, with computer use as fallback for legacy systems.
Where AI lands first: Unstructured domains (support, services, sales, knowledge work like consulting — their Joule for Consulting cut consulting effort ~30%). Structured/tabular is harder.
Predictive/tabular models (beyond LLMs): LLMs are token predictors; forecasting demand, cash flow, payment delays still needs XGBoost-style ML, which doesn’t democratize. SAP built RPT-1 (Relational Pre-trained Transformer) — a transformer trained for tabular classification/regression needing only small amounts of context data. Published at NeurIPS and other venues.
Adoption gap (“innovation race vs outcome race”): Biggest blockers are fragmented data, scale, and security. Customers who did data-integration homework win.
Future of finance/HR/supply chain roles: Like junior devs with Cloud Code — operators get pushed up a level, reviewing/supervising agent output and focusing on strategic scenarios.
Pricing shift: Moving from seat-based to hybrid consumption models, eventually toward outcome-based pricing, but customers still want predictability.
Why SAP wins: Make the tech disappear; focus on customer outcomes; don’t over-index on any one model partner; keep architecture flexible so value ships fast.
How the CTO spends his day: Reviewing progress across DB-to-UI layers, prototyping personally (CLI instances running Cloud Code), and talking to customers who “keep you honest.”
Personal curiosity: Quantum computing, specifically new algorithms for optimization problems (logistics, TSP, knapsack) that could reduce emissions and cost.

Summary

Actionable insights:

Write evals before writing code. The real mindset shift for AI-era developers isn’t just prompting — it’s defining verifiable outcomes and boundary conditions (security, privacy, code quality, data constraints) so agents can self-validate. TDD is effectively back, but because AI needs it, not because people liked writing it.
Plan for scale from day one. A RAG POC over 10 docs is trivial; 20,000 APIs, multi-country data, and user-context disambiguation is a systems problem. If you’re building enterprise AI, assume context-window blowout, master-data dependencies, and orchestration/disambiguation are the real work.
Bridge structured and unstructured data. Most enterprise value lives in tables (finance, inventory, supply chain). Knowledge graphs (SAP’s approach) or NL-to-SQL layers are the glue between LLMs and tabular systems.
Don’t use LLMs for prediction. For demand forecasting, cash-flow, classification, regression — LLMs are the wrong tool. Classical ML works but doesn’t democratize (you need a data scientist per use case, per country). Watch purpose-built tabular transformers like SAP’s RPT-1.
Capture tribal knowledge via “agent mining.” Every human-in-the-loop clarification is a data source. Record decision traces; use them to either flag anomalies or promote better behavior into the standard operating procedure. This is how agents compound in enterprise.
Tool calling over computer use for most enterprise workflows — faster, more structured, more reliable. Computer use fills the legacy/API-less gaps.
Pricing will move to consumption → outcome-based. Build products assuming this; but customers need predictability today, so hybrid models win in the transition.

Career advice explicit and implicit:

The junior-developer-with-Cloud-Code analogy generalizes: every role gets “up-leveled.” Finance analysts, HR ops, supply chain planners won’t disappear; they’ll supervise agents, run more scenarios, think more strategically. Invest now in supervisory, review, and scenario-thinking skills.
When selling tech (even as a technologist to executives), do not pitch the technology. Start with “what’s top of mind for your business?” and work backward to tech. Herzig calls this the mistake he made more than anyone.
“Make the technology disappear” — a career-long north star for engineers inside enterprise: shipping outcomes, not elegance.
Keep prototyping personally. The CTO of a 400k-customer company still has multiple CLI instances running Cloud Code during the day.

Tech patterns mentioned:

Generative UI — UIs dynamically generated based on analytical questions and proactive overnight agent runs.
Agent mining / process mining 2.0 — record decision traces and human-in-the-loop inputs to generate new evals and refine SOPs.
Knowledge graph as NL-to-structured-data bridge — SAP Knowledge Graph glues language models to tabular business data.
RPT-1 (Relational Pre-trained Transformer) — transformer architecture adapted for tabular prediction (classification, regression, time series) with small-context learning, to avoid training hundreds of bespoke ML models per country/product.
MCP at scale — viable for ~10 APIs, breaks down with hundreds without orchestration/disambiguation layers.
Hybrid pricing (seat + consumption → outcome) — a migration pattern for SaaS companies in the AI era.
Light LLM vulnerability cited as a caution — security review is now inseparable from AI adoption; don’t just git pull the hot open-source thing.
Quantum for combinatorial optimization — TSP, knapsack, logistics routing as candidate problems; SAP staying hardware-agnostic and focusing on algorithmic work.