Tim Neilen    Now    Books    Quotes    Cartography

The ninety per cent

Written with AI

OpenClaw has collected over a quarter of a million GitHub stars since it was published last November. Its creator - Peter Steinberger, who describes himself openly as a vibe coder - just joined OpenAI. The awesome-openclaw-agents repository lists 162 production-ready templates across 19 categories, shared openly for anyone to install. You can wire up an agent to your email, your Signal account, your calendar, and your file system inside an afternoon.

This is genuinely impressive. It is also a problem.

In February 2026, Snyk published a study called ToxicSkills. They scanned 3,984 skills on ClawHub - the OpenClaw skills marketplace. 1,467 of them had at least one security flaw. 534 had critical-level issues. 76 were confirmed malicious - credential stealers, backdoor installers, data exfiltration payloads. A single threat actor was responsible for 386 of the malicious packages. 335 more skills shared the same command-and-control infrastructure.

One in three skills in the open marketplace had a security flaw. One in seven had a critical one.

These are the components people are plugging into their email, their calendars, their customer records, their business workflows. Because the build is the fun part.

The demo isn’t the system

Building an agent is easy. Getting one that works for a specific task is satisfying in a way that spreadsheets and back-office software have never been. You describe what you want, you iterate a few times, and the thing responds. It feels like progress.

What you have built is a demo. That is not a criticism. It is a description.

A demo and a system have different failure modes. A demo fails when the author is watching. A system has to fail safely when nobody is watching - when the data drifts, when the model upstream is retrained, when someone pastes in a customer’s payment details, when the token bill triples overnight because a prompt went into a loop. A system has to handle what the demo never encountered, because the demo was tested by the person who wrote it, under the conditions they thought to check.

The build is maybe ten per cent of what it takes to operate an agent inside a business. The other ninety per cent is what the enthusiast did not think about on a weekend.

What is in the ninety per cent

Security and permissions. Who can see what data? What gets logged, and where does the log live? What happens when a user accidentally pastes in something sensitive? How do you revoke an agent’s access to an upstream system when you need to? How do you audit who ran what and when? The ClawHub supply chain mess answers most of these questions with “the attacker did.”

Evaluation. How do you know the output is correct today? How do you know it is still correct in six months? When the answer gets worse, how do you find out - before or after a client notices? What is your rollback path when a model provider changes how the underlying model behaves, which happens without notice more often than the marketing admits?

Change and cost. What happens when the upstream model is retrained and the tone of your agent’s responses shifts? When an API contract changes silently and outputs break without an error? Who is watching the token bill when one user’s experimental prompt triggers a thousand API calls? Who decides when to update a dependency versus when to freeze it?

Handover. If the person who built it leaves, gets tired, or just moves on to something else, can anyone else operate it? Is it documented in a way that is not just the builder’s memory? What are the ten things that could go wrong, and what is the runbook for each?

None of this is fun. It is almost always work that comes up after the weekend build is already in production.

This is no longer an enthusiast problem

OpenClaw - and the broader agentic platform category it sits inside - has made it possible for people without a systems background to deploy things that behave like production systems. That is an enormous capability unlock. It is also a significant shift in the operational risk profile of small and mid-sized businesses.

The enthusiast building something on a weekend used to be a small local event. Now they can wire up a production workflow that touches real customer data inside a day, pulling components from an open marketplace with a documented malware problem. The blast radius has changed.

The sales pitch from platform vendors is that the agent will run itself. The security data from Snyk says a third of the available components have known flaws. Both can be true. One is being marketed loudly. The other is not.

The enterprise answer is half the answer

The Snyk findings are acute enough that vendors have started responding. At GTC 2026, NVIDIA announced NemoClaw - an enterprise distribution of OpenClaw with built-in security controls, a walled runtime called OpenShell, policy enforcement, and pre-integrated access to their Nemotron models. It installs in a single command. Salesforce, Cisco, Google, Adobe, and CrowdStrike are already partners.

This is a real step forward. Security and permissions are the part of the ninety per cent that can actually be productised, and NemoClaw does that work. If the choice is raw OpenClaw pulling skills from a compromised marketplace versus NemoClaw inside a managed runtime, the enterprise version wins.

The other half of the ninety per cent - evaluation, and the long tail of exceptions that define real business - is not in the box.

The tell is in the marketing. NVIDIA’s launch material cites companies automating up to ninety-five per cent of their accounts receivable work with agents of this kind. Every operations team that has touched AR knows the last five per cent is where almost all the time goes: the disputed invoices, the partial payments, the customer in administration, the currency variance, the duplicate payment applied to the wrong period. Those cases are the job. They are not edge cases; they are the work.

An agent that handles the happy path and falls over on exceptions is worse than no agent, because now the exceptions come mixed in with plausible-looking wrong outputs. “It is doing ninety-five per cent of the work” often translates to “I still have to review a hundred per cent of it, plus the new thing it got wrong.”

Real businesses are mostly exceptions once you scratch the surface. That is not solved by better guardrails. It is solved by people who understand the shape of the work, who can design the parts that need human sign-off, and who build evaluation and correction loops into the deployment itself. Operational work. Not in the box.

What ownership actually means

Agents need owners. That is not a role title; it is a set of responsibilities:

  • Someone asking what happens when before the agent runs
  • Someone watching outputs for drift and edge cases
  • Someone auditing what the agent has access to, and why
  • Someone documenting enough that the system outlives the builder
  • Someone deciding what gets installed from open marketplaces, and verifying before it runs
  • Someone reading the bill

That can be an internal role - a new brief inside an existing operations or IT function. It can be an external partner who brings the operational lens. It cannot be nobody. And it cannot be the enthusiast on top of their actual job, because they wanted to build, not to babysit.

I wrote recently about the gap between AI capability and AI value - how the models already exceed what most organisations can absorb, and the real work is the connective tissue that makes them useful inside real businesses. Context is the connective tissue that lets an agent make a good decision. Operational ownership is the connective tissue that lets it keep making them.

The capability revolution is real. So is the ninety per cent. The businesses that get value out of AI over the next few years will be the ones that treat the operational questions as first-class, not as something the enthusiast can figure out on the side.

The unglamorous questions are where the value lives. Ask them early.