Executive Research · AI Security · May 2026

Hacked for just $20.

46 million messages. 2 hours. One AI agent.

Chander Dhall Builder • Leader • Speaker

An AI agent got attacker-level access to McKinsey's Lilli. This is not a SQL-injection story. It is an identity, permissions, and production-readiness story.

Read Full Report →

$20Attacker spend

What Was Exposed

46M

Chat messages exposed
Source: CodeWall, May 2026

The attacker got read-write access to the production database.

The exposed system included confidential files, user accounts, authentication tokens, and writable prompts. The dangerous part was not only data access. It was authority over the rules the agent followed.

Why It Matters

Writable prompts mean poisoned advice at consultant scale.

CodeWall stopped at disclosure. A motivated adversary would not need to steal data only. They could reshape what the AI system recommends.

Exposed46.5M

Plaintext client conversations across engagements.

Writable95

System prompts that controlled agent behavior.

Scale43K

Consultants could receive altered guidance.

RiskTrust

The breach moved from confidentiality to decision integrity.

The Real Failure

22 endpoints shipped without authentication.

The report describes 200+ documented API endpoints, including 22 that required no authentication. Some write paths were exposed too.

Open surface

22

Unauthenticated endpoints

Production routes were reachable without the identity checks leaders assume exist.

Direct query

SQL

Keys entered the database path

Queries were built from request values, turning weak access control into data reach.

Executive lesson

Proof

Security claims need operational evidence

Ask what is enforced in production, not only what the platform supports in theory.

Architecture Shift

SaaS was built for humans. Agents play by different rules.

The old browser screen was a practical permissions layer. Agentic systems call APIs directly, which moves trust into code, tokens, scopes, and runtime policy.

Decision area	SaaS era	Agent era	Board question
Primary actor	Humans click screens	Agents call APIs and tools directly	Can the system identify agent actors?
Permission boundary	Screen, role, workflow	Code, scopes, tokens, policies	Can defaults bypass review?
Operational proof	Vendor claim plus demo	Trace, audit, gates, revocation	Can a reviewer replay what happened?
Review timing	Architecture after purchase	Architecture before commitment	Who validates viability before signing?

Market Signal

Six vendors. One signal: the model is not enough.

The market is moving toward implementation support, persistent context, governed tool catalogs, and API-native business systems.

Signal	Move	Why it matters	Report read
Anthropic	Enterprise AI JV	Applied AI engineers embedded with customers	Deployment depth
OpenAI	The Development Company	Closer to enterprise deployment reality	Implementation ownership
SAP + WalkMe	Persistent enterprise AI	Real-time AI layer over business data	Runtime context
Pinecone Nexus	Compiled knowledge	Persistent context across agent sessions	Memory governance
Salesforce	Headless 360	Full CRM through API, not browser UI	Agent permissions
ServiceNow	MCP registry	Governed and auditable agent tool catalog	Tool control plane

The Inversion

If the agent cannot authenticate, the strategy fails.

Agent identity, permission boundaries, and auditability are not technical cleanup. They are business conditions for using AI safely.

01 · Reframe

Permissions

Access is a business decision

Identity and audit belong on the strategy table, not the IT backlog.

02 · Validate

Proof

Test before you sign

Implementation viability must be proven before purchase, not after.

03 · Include

Review

Architects in the room

Technical reviewers belong in procurement day one.

Board Readiness

Two questions separate theory from production control.

These questions expose the gap between vendor capability and what actually happens when teams are rushed, defaults remain unchanged, and agents gain tool access.

Question 01Actor identity

Does your platform know the difference between a human and an agent?

Why it mattersBlast radius

Agents need narrower, task-scoped access than humans.

Question 02Pressure test

What happens when the team is under delivery pressure?

Why it mattersDefaults

The gap appears when configuration is never revisited.

Readiness Review

A production AI agent should pass these checks before scale.

If the answers are vague, split across owners, or dependent on human memory, the deployment needs more control before it touches sensitive workflows.

Identity

Who

Separate human and agent actors

Use narrower access, task-scoped permissions, and real-time revocation.

Runtime

How

Replay the decision path

Capture traces, tool calls, policy checks, and environment context.

Release

When

Gate deployment on proof

Require explicit evidence before production rollout and vendor commitment.

The Operating Question

The $20 breach was preventable.
So is yours.

The organizations that avoid the next Lilli ask the identity, permission, and pressure-test questions before they deploy.