The discipline stack

24 May 2026

In February 2026, Andrej Karpathy renamed the thing he invented. “Vibe coding,” coined twelve months earlier, became “agentic engineering.”

The reason is what most people building with AI in 2025 didn’t want to hear. The first 80% of a software project flows from intent. The last 20% is edge cases, failure recovery, security, audit. That last 20% needs an engineer who has actually thought through the system.

Worth pausing on what the rename means, and what it still leaves out.

What the rename means

Karpathy’s distinction, cleanly:

Vibe coding: describe what you want, accept what comes back. Iterate, paste errors back, sometimes ship things that work, sometimes ship things that don’t.
Agentic engineering: design the system, specify the constraints, use AI to accelerate implementation you’ve already reasoned through.

Karpathy isn’t a journalist describing a trend. He coined the original term, watched it land, watched the failure modes, and renamed his own concept. That is a confident reversal from the highest-credibility voice in the space.

The word “engineering” carries decades of professional weight: specification, design review, testing, monitoring, accountability. The word “vibing” carries none of that, and was honest about it. Karpathy moved one word and changed the standard by which AI-built software is judged.

The ninety per cent from the other side

The Ninety Percent, which I published last month, says exactly this from the operational direction. The build is maybe ten per cent of what it takes to run an agent inside a business. The other ninety per cent is security and permissions, evaluation, change and cost, handover, and ownership. None of it is fun. All of it determines whether the system still works in six months when nobody is watching.

Karpathy’s renamed framework covers part of that ninety per cent. The design-the-system, specify-the-constraints work is the engineering discipline that prevents the most obvious failure modes. The CI/CD review pipeline, the human-approval node, the test suite, the version control, the audit trail. All of it lives inside agentic engineering as he now defines it.

He started from “AI accelerates implementation” and arrived at “implementation needs engineering thought.” I started from “demo isn’t a system” and arrived at “operating it needs ownership.” Different doors into the same room.

The conversation has moved past “does AI work in production?” and toward “how do you do it responsibly?” That second conversation is more useful.

Necessary, not sufficient

The move I want to make. Agentic engineering covers development discipline. It does not fully address what happens after the system ships.

Four things Karpathy’s framing names lightly or not at all.

Someone has to operate it

An engineered system still needs an owner. Someone watching outputs for drift. Someone reading the bill. Someone deciding when to update a dependency versus when to freeze it. Engineering discipline produces a deployable artefact. Operational discipline keeps it working over time.

The enthusiast can engineer. The enthusiast usually cannot operate. Different skills, different time horizons, different reward profiles.

Organisational context is not in the code

I wrote about this in The Gap. The connective tissue between AI capability and business value is not model quality. It is context.

An agentically-engineered system that does not know the customer has been with you for fifteen years and always pays early will follow the rule and lose the customer. Engineering does not capture organisational context. Organisations capture organisational context, and most do not, deliberately.

You can ship the best-engineered AI system in the world into an organisation that has not done the context work, and it will still produce technically correct, contextually wrong outputs. The engineering did not fail. The context was not there.

Evaluation is a separate discipline

The model upstream gets retrained. The API contract changes silently. The token bill triples overnight because a prompt went into a loop. The agent gradually gets worse, or gets weird in specific ways, and nobody catches it until a client notices.

Agentic engineering builds the system. Evaluation discipline keeps it honest. These are different practices, and the second one is much less fun than the first.

A system without evaluation discipline does not fail loudly. It fails by being slightly wrong, more and more often, until trust quietly evaporates. The bad failure mode for AI in production is not a crash. It is degraded output that looks right enough to deploy. That is the failure mode Determinism Wins is built around: stochastic systems silently doing the wrong thing under conditions that worked yesterday.

Handover, or it dies with the builder

If the engineer who built it leaves, gets tired, or moves on, can anyone else operate it? Documentation. Runbooks. Reproducibility. Naming conventions a stranger can decode.

The engineer who builds does not usually want to write the handover doc. Whoever inherits the system pays for it. Often in incidents.

The full stack of disciplined AI

What disciplined AI actually requires:

Agentic engineering for development. Specify constraints. Engineer the system. Do not ship vibe-coded experiments.
Operational ownership for deployment. Someone watching outputs, reading bills, auditing access, deciding updates.
Organisational context for relevance. The institutional memory that lets the engineered system make good calls inside a real business.
Measurement discipline for durability. Evaluation, drift detection, regression catching, contract testing.

None alone is enough.

The first without the others ships engineered systems that nobody operates.
The second without the first ships disciplined operations on top of fragile builds.
The third without the others gives you a context-rich organisation with no AI capability.
The fourth without the others gives you good metrics on systems that are quietly degrading.

The businesses that get value from AI over the next several years are the ones treating all four as first-class disciplines. The ones treating any of them as optional, or as “we’ll add it later,” produce demos that never make it into production, or production systems that quietly stop working.

This is also where The Messy Middle lives. None of the four disciplines look like AI work. They look like operational work. Which is why most AI conversations skip them and most AI deployments suffer for it.

Where this lands

For organisations adopting AI now. Stop treating “we’re using AI” as the achievement. Start treating “we have an AI system that ships value reliably six months from now” as the achievement. Different goals. Different work behind them.

For MSPs and consultancies. This is where the work lives. Not in the model layer. Not in the demos. In the discipline stack: engineering, operations, context, measurement. The market for disciplined AI delivery is meaningfully larger than the market for vibe-coded experiments, and the half-life is meaningfully longer.

For Karpathy’s framework. Agentic engineering is a real improvement on vibe coding. The conversation should keep moving. Next conversation: agentic operations.

Renaming is half the work. The other half is admitting that engineering discipline alone does not keep a system useful inside a real business. Models keep getting better. The systems we build around them have to keep getting more disciplined, because the same capability that makes them powerful also makes them confidently wrong when they are wrong.

The unglamorous questions are where the value lives. Ask them early. Ask them again in six months. Then again.