A licence is not a deployment

18 Jun 2026

The pitch for Microsoft 365 Copilot is consistent. Microsoft writes it, and it gets repeated down the chain to every small and mid-sized firm. It sits inside your tenant. Your data never leaves. That is the security story. Almost every conversation I have with a firm that has been sold Copilot has been led with that line.

It is true. It is also not a deployment. It is a licence, sold as if it were a deployment. If you are the IT Manager inside one of those firms, this is the conversation you are about to have with your directors, whether you start it or they do.

Data not crossing a tenant boundary is perimeter security. It is the question of where the bytes live at rest and in transit, who Microsoft shares them with, whether they are used to train public models. Those questions are real, and Microsoft handles them well. They are not the questions that matter once an AI agent is operating inside the firm.

The questions that actually matter are different. What data can the agent read, on behalf of which user. What can it write, and when, and to where. What gets logged. What gets redacted. How does the response change based on the role of the person asking. Which workflows is it wired into. Which is it deliberately kept out of. None of those have anything to do with tenant boundaries. All of them are the actual intelligence layer of the firm.

The pitch sells perimeter security for an interior problem. And it sells a licence as if it were a deployment. A licence is a procurement decision. A deployment is a design decision. Almost everything that determines whether a firm gets value from AI happens after the licence is provisioned, and none of it is work the licence does.

This is the same territory The Ninety Percent covered from a different door. The build is ten per cent. The other ninety per cent is operational discipline. The Copilot playbook sells the ten per cent and calls it the deployment.

Why this pitch keeps working

The reason it lands is older than Copilot. Microsoft is the safe choice. No director ever got fired for buying Microsoft. The tenant-boundary promise is the same reassurance that worked for Exchange, SharePoint, and Teams, all of which trained twenty years of trust into the procurement reflex. Copilot inherits that trust by default.

The problem is that AI is not Exchange. Email is a delivery surface. AI is a decision surface. Trusting Microsoft to host an inbox is a different question to trusting Microsoft to design what an AI agent can read, write, and act on inside the firm. The first question they have answered. The second one they have not been asked, because the procurement reflex skips straight to the licence.

That reflex is the actual product. The licence is the visible artefact. The reflex is what hands a director a decision when they have a budget cycle, a board meeting, and no time to evaluate a category they have never bought before.

What “stays in your tenant” actually gets you

Consider a compliance question landing in the inbox of a 100-seat firm that has rolled out Copilot. What is everyone asking AI to do for us? Pull the prompts.

The audit log shows the user, the timestamp, the apps Copilot was used inside, the files Copilot read to answer them, whether it touched the public web, whether anyone tried to jailbreak it, which sensitivity labels were involved. The actual prompt and response? Not in the audit log. They sit in a hidden folder inside the user’s mailbox, retrievable through eDiscovery, one search at a time. If you want ongoing scanning of what people are typing into Copilot, that is the new Purview Suite for Business Premium add-on at $10 per user per month, on top of the $30 per user per month Copilot licence, on top of Business Premium itself.

The audit pipeline is mature. The eDiscovery flow works. The SMB-priced Purview Suite finally puts communication compliance scanning below the E5 line. But notice what is happening. The tenant-boundary promise is intact. The data has not left. And you still cannot answer the compliance question without standing up a sequence of searches, cross-referencing three reports, and pivoting through two admin centres.

That is exactly the kind of work that does not get done at a 50-seat firm because nobody owns it end to end. The tenant boundary held. The intelligence layer was never built.

What the licence doesn’t build

The standard Copilot rollout is consistent: a readiness audit, the licence, a one-hour training session, a monthly invoice. That is not an AI deployment. It is a software resell with an audit on the front and a workshop on the back, the whole thing dressed up in the language of security.

The actual intelligence layer of a firm is everything between user types a prompt and answer arrives. Which model handles which kind of question. What corporate data is grounded in and through which access path. Where redaction happens. What gets logged and to whom. How the response changes when the user is a partner versus an admin versus a contractor. Which workflows the AI is wired into, and which it is deliberately kept out of.

The grounding question alone tells you most of what you need to know. Copilot grounds into the Microsoft 365 graph: SharePoint, OneDrive, Exchange, Teams. That is it. It does not see on-premises file shares. It does not see the practice management system the firm runs the work in. It does not see the accounting platform. It does not see the bespoke database the partners commissioned in 2017. For a professional services firm this is the centre of the problem. The work data lives in practice management. The financial data lives in accounting. SharePoint and email are the periphery, not the centre. Copilot grounds into the periphery and calls it ready.

The APIs are half the point

Grounding is only the first half of the gap. The second is reach - and it is the half that gets ignored, because it is the half that needs developers.

Copilot and its agents are a chat surface. You ask, it answers, and with Copilot Studio you can wire an agent to take a few actions. That is genuinely useful, and it is also one narrow shape of AI. A lot of the highest-value work is not a person typing a prompt at all. It is a model reading a job record the moment it changes, checking it against the last three like it, and flagging the variance before anyone asks. It is an invoice landing and getting matched, coded, and queued without a human in the loop. It is two systems that have never spoken being made to talk through a model in the middle. None of that is a chat. None of it runs through Copilot.

That work lives in the APIs. Reaching into the practice management system, the accounting platform, the line-of-business database - reading from them, writing back under tight constraints, orchestrating across them - is integration work. It is code. It is MCP servers and connectors and a gateway you control, not a licence you switch on.

This is the piece the licence-resell model and an IT team without developers structurally cannot deliver. You can audit readiness, provision seats, and run a training session without writing a line of code. You cannot build the integration layer that way. So the firm gets the chat surface, the chat surface grounds into the periphery, and the systems where the work actually lives are never connected. The APIs were half the point, and they were the half nobody was equipped to build.

To be fair to Microsoft, deeper grounding is possible inside Copilot Studio. The connector model, the per-agent grounding, the access controls. The building blocks exist. They are just not what gets touched for firms this size. Copilot Studio is consulting work too, and the consulting layer for firms this size is not on anyone’s org chart. The outcome ends up the same regardless of which stack the firm started with: the licence got bought, the layer above it did not get built.

I rarely see this work done for firms this size. Not because it cannot be - because the licence resell is a cleaner business model, and the integration layer is consulting work, and consulting work scales differently. The result is firms paying $30 per user per month for a Copilot deployment with no intelligence layer underneath it, then wondering why adoption sticks at drafted an email a bit faster.

What designing the intelligence layer actually looks like

A working pattern I keep coming back to is staged phases, each one a separate decision the firm has to sign off on, each one reversible.

Phase 1 - Chat only. A sanctioned AI chat surface authenticated against the firm’s identity. No access to firm data yet. Replaces staff using ChatGPT on personal accounts and email drafted in consumer tools. Net read: same risk, now monitored.
Phase 2 - Read corporate data. The same chat can now answer questions against the firm’s core systems. Read-only. Each user sees only what they could already see in those systems. Net read: no new data exposure, just faster access.
Phase 3 - Narrow writes. A small, named set of write actions, each one a separate decision. Constraints baked in (own data only, current period only, capped impact, logged and reversible). Each new write tool is a fresh sign-off.
Phase 4 - Workflows. Cross-system orchestration. Only after phase 3 has run incident-free for a defined period.

Each phase is a decision. Nothing rolls forward without sign-off. You can stop at any phase. The intelligence layer is the design of those phases, the constraints inside them, and the observability around them. None of it is purchased. All of it is consulting work. It maps cleanly onto the four disciplines The Discipline Stack names: engineering for the build, ownership for the operation, context for what the agent grounds into, measurement for whether it is still working in six months.

The plumbing I keep landing on

Stack choice is downstream of the intelligence layer. Pick whichever set of tools lets you design the layer cleanly. When the choice is mine to make, I keep landing on the same open-source shape, for three specific reasons.

Observability. A gateway you own gives you every prompt, every response, every token, every model decision in your own database, in a format you defined. The audit answer becomes here is the table, what do you want to query, not let me build an eDiscovery case.

Fit. A 50 to 200 seat firm does not have generic needs. It has specific workflows, specific data, specific risks. A stack you own bends around the firm. A vendor stack pulls the firm toward the vendor’s defaults.

Reach. LiteLLM in front of any model provider. MCP servers in front of any internal system. Switching models is a config change, not a contract negotiation. The stack stays useful as the model landscape shifts, which it does monthly.

The shape:

Entra SSO at the front door, so identity is the same identity that owns everything else.
Open WebUI as the chat surface. The UX users already expect from ChatGPT, pointed at your own infrastructure.
LiteLLM as the gateway. Every interaction passes through one process you control. Logged to your own database. Fed into your own SIEM. Redacted by your own rules. Retained on your own timeline. This is where the observability lives.
A model provider of your choice behind it. OpenRouter, Bedrock, Azure OpenAI, a local model.
MCP servers to ground the model into corporate data, with access controls you wrote, not access controls you inherited.

None of these components are novel. They are mature enough that a competent IT lead with a developer alongside can stand them up in a week. The novel part is that the access barrier finally dropped to a level where the small end of town can play.

The honest tradeoff

This is more work than turning Copilot on. Someone owns it. Updates. Monitoring. The gateway upgrade that breaks a callback. The day the model provider has a regional outage. A real cost in attention.

In exchange, every interaction with any model lives in a database you defined. Shadow AI shrinks, because the in-house tool is good enough to displace the ChatGPT habit. Per-user, per-prompt cost is visible from day one, rather than a flat $30 per user per month whether the user touches it or not. The audit answer is a query against your own table, not a case built across two admin centres. Model selection becomes tactical per use case, not procurement per tenant. And the phase structure above is something you can actually build. With Copilot alone, phases 3 and 4 are not on offer.

The convenience of the bundle is real. So is the cost, and most of that cost is paid in the intelligence layer that never gets built.

The tenant boundary is the floor of the security conversation. It is not the answer. The layer above the licence is where the value lives - the grounding, the APIs, the constraints, the observability - and almost nobody is building it for firms this size.

A licence is a procurement decision. A deployment is a design decision. Don’t mistake the one you bought for the one you needed.