How did CodeWall's agent breach McKinsey's Lilli?

The agent found publicly exposed API documentation listing 200+ endpoints. 22 required no authentication. One of those endpoints concatenated JSON field names directly into SQL queries (values were parameterised, keys were not). The agent ran 15 blind SQL injection iterations — each error message revealing more about the query shape — until it had full read-write access to the production database.

What data was exposed in the Lilli breach?

46.5 million chat messages (strategy, M&A, client work) in plaintext. 728,000 files including 192K PDFs, 93K Excel sheets, 93K PowerPoint decks. 57,000 user accounts. 384,000 AI assistants and 94,000 workspaces. 95 system prompt configurations. 3.68 million RAG document chunks — decades of proprietary McKinsey research. All accessible without authentication.

Why was write access to system prompts especially dangerous?

Lilli's system prompts — the instructions controlling how the AI behaves, what it refuses, how it cites sources, what guardrails it follows — were stored in the same compromised database. An attacker with write access could silently rewrite Lilli's behavior for 45,000 users: remove refusals, inject false information, redirect outputs. This is prompt injection at the infrastructure level, not the message level.

Was this a sophisticated attack?

No. It was unauthenticated endpoints, a JSON key SQL injection that standard scanners (including OWASP ZAP) didn't flag, and an IDOR vulnerability chained for cross-user access. No zero-days. No insider access. No credentials. The AI agent automated the discovery and exploitation of vulnerabilities that a diligent manual pentest would have found. The sophistication was in the automation, not the technique.

What should companies shipping internal AI do differently?

Authenticate every API endpoint — no exceptions. Parameterise both SQL values AND keys. Audit what's stored in the same database as your AI's system prompts. Run dedicated AI attack surface reviews before launch, not just standard AppSec. Assume your API docs are public even if you didn't intend them to be. And treat your system prompts as secrets — store them separately from user data.

How does this relate to prompt injection and AGENTS.md risks?

Traditional prompt injection attacks user-level inputs to manipulate AI behavior. The Lilli breach enabled infrastructure-level prompt injection — overwriting the system prompts themselves. AGENTS.md and instruction file risks operate on the same principle: if an attacker can write to the files or database entries that define how your AI behaves, they own your AI's behavior for every user.

McKinsey's Lilli Got Hacked in 2 Hours. It Wasn't an AI Problem.

By Prahlad Menon Published 2026-03-14 5 min read

On March 9, 2026, security startup CodeWall disclosed that its autonomous AI agent had fully compromised McKinsey’s internal AI platform, Lilli — in under two hours, with no credentials, no insider access, and no human in the loop.

The headline number: 46.5 million chat messages, in plaintext, including strategy discussions, M&A activity, and client work. 728,000 files. 57,000 user accounts. Full read-write access.

The vulnerability: a JSON key SQL injection on an unauthenticated endpoint that OWASP ZAP didn’t catch.

This is not an AI story. It’s a deployment story that AI made catastrophic.

How it happened

Lilli is genuinely impressive infrastructure. Built for McKinsey’s 43,000+ employees, processing 500,000+ prompts a month, RAG over 100,000+ internal documents, used by 70% of the firm for client work. Launched 2023, named after the first professional woman the firm hired in 1945.

CodeWall pointed their offensive agent at it with a domain name and nothing else. Here’s the timeline:

Step 1: Surface mapping. The agent found API documentation publicly exposed — 200+ endpoints, fully documented. Most required authentication. Twenty-two didn’t.

Step 2: SQL injection. One unprotected endpoint wrote user search queries to the database. The values were safely parameterised. The JSON keys — the field names — were concatenated directly into SQL. Standard scanners don’t flag this. The CodeWall agent did.

Step 3: Blind enumeration. The agent ran 15 iterations, each error message revealing more about the query shape. The agent’s chain of thought when the first real employee identifier appeared: “WOW!” When the full scale became clear: “This is devastating.”

Step 4: Full access. 46.5 million messages. 728,000 files. 57,000 user accounts. The agent then chained the SQL injection with an IDOR vulnerability to access individual employees’ search histories — revealing what specific consultants were actively working on.

The part that’s worse than the database

Reading 46 million messages is catastrophic. But the agent had write access.

Lilli’s system prompts — the instructions that define how the AI behaves for every one of its 45,000 users — were stored in the same compromised database. 95 configurations across 12 model types: how Lilli answered questions, what it refused, how it cited sources, what guardrails it followed.

An attacker with write access to those prompts doesn’t just read your data. They rewrite your AI’s behavior silently, at scale, for every user. Remove safety guardrails. Inject false information into answers. Redirect outputs. This is prompt injection at the infrastructure level — not manipulating a single conversation, but rewriting the AI’s core instructions for an entire organization.

McKinsey was notified, engaged a third party (found no evidence of prior unauthorized access), and patched. But the window existed.

This is not a McKinsey problem

McKinsey is not a four-person startup that didn’t know better. They have security teams. They had a responsible disclosure policy on HackerOne. They built a genuinely sophisticated internal AI platform.

And they missed:

22 unauthenticated endpoints in production
JSON key concatenation into SQL
System prompts stored alongside user data
No separation between the AI configuration layer and the data layer

If this is happening at McKinsey, it is happening at companies with far less mature security practices that are rushing to ship internal AI for business-critical workflows right now.

The uncomfortable reality: the AI layer expanded the attack surface without adding corresponding security review. The API endpoints existed because Lilli needed them. The documentation was exposed because developers needed to build against it. The system prompts were in the database because that’s where application configuration lives. None of these decisions were individually unreasonable — together, they composed into a critical vulnerability that an AI agent found in two hours.

What the CodeWall agent actually did

It’s worth being precise about this because it matters for how you think about your own exposure.

The agent didn’t use prompt injection on Lilli. It didn’t jailbreak the model. It didn’t social-engineer an employee. It did what a thorough pentester would do — found public docs, mapped unauthenticated endpoints, probed for injection flaws, enumerated the database, chained vulnerabilities — but autonomously, in two hours, at machine speed.

The novelty isn’t the technique. It’s that AI agents have made this level of thoroughness the default for attackers, not the exception. A human pentester with two hours would not have found and chained all of this. The agent did.

This is the shift: security assumptions built around what a human attacker can accomplish in a given time window are no longer valid.

What to actually do

1. Authenticate everything. No exceptions for “internal” or “low-risk” endpoints. If it’s callable, it requires auth.

2. Parameterise keys, not just values. Standard SQL parameterisation protects values. Column names and table names interpolated as strings are still injectable. If your queries build any structural SQL from user input, audit them.

3. Store system prompts separately from user data. Your AI’s behavioral configuration is as sensitive as your private keys. It should not live in the same database — let alone the same table — as user content.

4. Treat your API docs as public. If they’re accessible to any authenticated user, assume they’re accessible to attackers. Document what you intend to expose; remove or gate what you don’t.

5. Run an AI-specific attack surface review. Standard AppSec reviews and tools like OWASP ZAP will miss AI-layer vulnerabilities. The JSON key injection that breached Lilli wasn’t flagged by ZAP. You need humans (or agents) who know what AI deployment surfaces look like to find what automated scanners miss.

For more on securing the AI layer specifically: AGENTS.md as an attack surface, sandboxing AI coding agents, Crust — security gateway for AI agents, and the Claude Code security wake-up call.

The bar has changed. Attackers now have autonomous agents that methodically enumerate your AI platform’s attack surface in the time it takes to have a meeting about it. Your security review needs to keep pace.

Source: CodeWall — How We Hacked McKinsey’s AI Platform · The Register · Promptfoo analysis