AI Infrastructure · Agent Memory · May 2026

The memory era
of AI infrastructure.

Production agents broke the old retrieval playbook.

Chander Dhall Builder • Leader • Speaker

Classic vector search solved a narrower chatbot problem. Production agents need richer memory systems that deliver the right shape of knowledge with provenance, permissions, and enough structure to complete multi-step work.

Read Full Report →

4knowledge shapes

Executive Snapshot

Most agent memory problems collapse into four knowledge shapes.

The vendor list keeps changing. The underlying shapes do not: prose, documents, tables, and graphs.

Shape 01 Prose

Wiki pages, notes, and fuzzy narrative text where semantic retrieval still works well.

Shape 02 Documents

Contracts, filings, and policies where hierarchy and section role carry meaning.

Shape 03 Tables

ERP, CRM, metrics, and governed business data where the tabular form matters.

Shape 04 Graphs

Relationships and dependencies where paths and neighbors matter more than isolated records.

The Core Failure

Agents need bundles, not snippets.

Classic RAG was good at narrow questions. Production agents need customer records, policies, tables, workflow state, and access rules together before they can do useful work.

Old unit

Snippet

Good for narrow answers

Semantic chunks still work for FAQ-style questions where the answer sits in one or two nearby passages.

Real need

Bundle

Better for agent work

Multi-step work needs assembled records, policies, metrics, and workflow context in one evidence package.

Design rule

Shape

Pick retrieval by job

The right first question is what form of knowledge the task requires, not which retrieval engine is fashionable.

Category Signal

Even Pinecone now treats vector search as only one primitive.

Nexus is positioned as a knowledge engine that compiles knowledge once and serves structured answers later. KnowQL adds a query vocabulary for scope, grounding, filters, and output shape.

Compile once

Upstream

Move work out of inference

Fetching, assembling, and structuring happen when knowledge changes, not on every agent call.

Serve typed answers

Structured

Give agents usable outputs

Typed responses with citations are closer to operating context than ranked chunks.

Vendor claims

Check

Read the numbers as product claims

Pinecone reports large gains in speed and token cost, but the bigger takeaway is the direction of the architecture.

PageIndex Lesson

The wrong retrieval unit breaks good models.

PageIndex's public argument is simple: some professional documents lose meaning when flattened into chunks. The retrieval unit must match the work.

Work to be done	Wrong default	Better unit
FAQ and short answer	Overengineer with structure the task does not need	Chunk or paragraph
Contract or filing analysis	Flatten sections into interchangeable text	Section tree with hierarchy
Financial or operational analysis	Convert tables into prose and hope the model reconstructs them	Native table or governed metric view
Dependency or root-cause reasoning	Retrieve disconnected text and ask the model to guess the path	Graph neighborhood
Repeated workflow	Rebuild the same context every run	Compiled brief or reusable bundle

Structured Data

SAP is betting that enterprise memory is mostly structured data.

The Dremio and Prior Labs acquisitions point to lakehouse architecture, semantic layers, lineage, access control, and tabular models as first-class memory primitives for agents.

Dremio

Open data layer

Lakehouse and federation

SAP positions Dremio as the open foundation for SAP and non-SAP data with federated analytical reach.

Catalog

Meaning + lineage

Semantic layer and access rights

SAP says the shared catalog will carry meaning, relationships, access rights, and lineage as part of the memory layer.

Prior Labs

EUR 1B+

Tabular models matter

SAP committed more than EUR 1 billion over four years to scale Prior Labs around structured data and tabular foundation models.

Broader Memory Stack

Retrieval is only one layer of memory.

The market is also making runtime state, long-term memory, and relationship memory explicit. That expands the design space beyond search.

Microsoft GraphRAG

Relationship memory

GraphRAG builds knowledge graphs, community hierarchies, and summaries for complex reasoning over private data. Later work like DRIFT and LazyGraphRAG focuses on cost and query quality.

Cloudflare Agents

Runtime state

Cloudflare treats agents as stateful Durable Objects with SQL state, scheduling, and memory APIs so the agent does not reconstruct itself on every request.

Google Memory Bank

Long-term memory

Google's platform now exposes memory extraction, consolidation, asynchronous generation, and identity-scoped persistence across sessions.

Context Windows

A larger context window still needs memory design.

More context helps. It still does not decide what belongs in context, which source is authoritative, or how to prevent noise from overwhelming the model.

Missing functionAuthority

A large window does not tell the agent which source is the source of truth.

Missing functionPermissions

It does not enforce access policy or identity-scoped memory boundaries.

Missing functionHierarchy

It does not preserve section structure, table logic, or governed metric meaning by itself.

Real failureNoise

More raw context can still produce slower, noisier, and less reliable task performance.

Build Sequence

Do not pick a database first.

The highest-leverage move is to define the agent's contract with data. Once the bundle is explicit, the memory stack usually becomes much clearer and much smaller.

Read the full report →

1

Define the contract

Write down the work the agent must do and the exact evidence it must receive to do that work reliably.

2

Write the bundle

Name the required records, clauses, tables, metrics, workflow states, and policy fragments instead of asking for generic relevant context.

3

Choose primitives

Use the fewest memory primitives that can deliver that bundle: vector retrieval, document trees, semantic layers, graphs, or compiled briefs.

Diagnostics

The memory problem shows up first in agent logs.

If a team wants to know whether its memory architecture is working, it should inspect traces before it draws more boxes.

Log question	What it reveals
How many retrieval calls happen before useful work begins?	Whether the agent is solving the task or only warming up its context.
How often does the agent reopen the same sources?	Whether knowledge is being reused or rediscovered every run.
How much token budget is spent rebuilding raw context?	Whether the memory layer is reducing cost or merely moving it around.
How often does the agent ask for information the system already has?	Whether memory is actually operational or just stored nearby.
How often does one run rediscover what a previous run learned?	Whether the system has durable memory or expensive amnesia.

Sources

Source notes.

These are the public sources that anchor the report. Product-performance numbers are treated as vendor claims unless independently benchmarked.

Pinecone Nexus product page and launch posts. Public descriptions of compiled knowledge, typed answers, field-level citations, and vendor-reported performance claims.
Pinecone KnowQL public design material. Public query concepts including scope, filters, grounded output, and output-shape constraints.
PageIndex docs and GitHub repository. Public argument for tree-structured, reasoning-based retrieval without chunking for long professional documents.
SAP News Center, May 4, 2026. Dremio and Prior Labs acquisition announcements, including semantic-layer and tabular-model positioning.
Microsoft Research GraphRAG material. GraphRAG, DRIFT Search, LazyGraphRAG, and BenchmarkQED posts on graph-enabled retrieval quality and cost.
Cloudflare Agents and Durable Objects docs. Public documentation on stateful agent execution, Session memory, and durable SQL-backed runtime state.
Google Cloud Memory Bank docs. Public documentation on long-term memory extraction, consolidation, isolation, and persistence across sessions.
Chander Dhall Methodworks analysis. Used as the framing thesis, then validated against current public sources.

Full references appear in the review version of the report.

Final Takeaway

Start with the work.
Then choose the memory stack.

Teams that win this cycle will not be the ones chasing the trendiest retrieval tool. They will be the ones that define the bundle clearly, respect the shape of the knowledge, and use the fewest primitives needed to deliver it.