Skip to main content
Back to reports Slide brief
Agent Memory

The Memory Era of AI Infrastructure Executive Brief

Executive Research . AI Infrastructure . May 2026

The memory era of AI infrastructure.

Production agents exposed the limit of classic vector search. The serious design question is no longer which database to buy first. It is what shape of knowledge an agent needs, how that knowledge should be assembled, and how to deliver it with provenance, permissions, and the right amount of context.

CD
Chander DhallBuilder . Leader . Speaker
Published May 13, 2026 Research Brief Agent Memory
4
Major knowledge shapes show up repeatedly in production agent work.
3
Practical build steps matter more than vendor-first shopping.
6
Public product signals in this brief point to the same memory problem.
1
Wrong place to start: picking a database before defining the bundle.

Executive Summary . What this brief argues

  • RAG and vector search are not the same thing. Retrieval-augmented generation is a broad loop. Vector search is one retrieval primitive inside that loop, and it is often not enough for production agents.
  • Agents need bundles, not snippets. Multi-step work often requires customer records, policies, prior tickets, entitlements, metric definitions, workflow state, and access rules together, not a few semantically similar paragraphs.
  • Pinecone's Nexus launch is a category signal. Pinecone now frames retrieval as compiled, structured, governed knowledge for agents, not only semantic similarity over chunks.
  • PageIndex makes a stronger challenge to chunking. Its public docs argue that long professional documents should preserve hierarchy and section structure rather than being flattened into interchangeable text chunks.
  • SAP is betting on structured enterprise memory. SAP's Dremio and Prior Labs acquisitions point toward semantic layers, governed data, and tabular reasoning as first-class ingredients for agent memory.
  • The memory problem is broader than retrieval. Microsoft GraphRAG, Cloudflare Agents, and Google Cloud Memory Bank show that memory also includes relationships, runtime state, and long-term memory across sessions.
  • The practical sequence is simple. Define the agent's contract with data, write down the bundle it needs, then choose the smallest set of primitives that delivers that bundle reliably.
01 Retrieval Limits

Why classic RAG is no longer enough.

Classic RAG worked well for narrow question answering. Production agents do something harder. They complete multi-step work, which means they need complete operating context, not only nearby text.

The first distinction to keep clear is simple. RAG is the loop. A system retrieves information and supplies it to a model for context. Vector search is one retrieval method inside that loop. It is extremely useful for fuzzy prose and semantic similarity. It is not a universal answer to agent memory.

That distinction matters because many production failures are not search failures in the narrow sense. The agent found some text. It still did not have the full bundle it needed to act. Support agents often need the customer record, entitlement rules, previous tickets, policy excerpts, and the current workflow state together. Finance agents often need governed metric definitions, source tables, exception logic, and reporting schedules together. Legal and procurement agents often need section hierarchy, definitions, schedules, and overrides together. A few semantically similar paragraphs do not solve those jobs.

This is why production agents waste so much effort reconstructing context from scratch on every run. They reopen the same documents, refetch the same records, re-ask known questions, and spend expensive tokens rebuilding context before useful work begins. The core memory problem is not only finding text. It is delivering the right shape of knowledge for the work being done.

The four shapes that keep showing up

Shape 01
Prose
Fuzzy narrative text, support notes, wiki pages, and conversational content where semantic retrieval still works well.
Shape 02
Docs
Long structured documents such as contracts, filings, policies, and manuals where hierarchy and section role carry meaning.
Shape 03
Tables
Business data in ERP, CRM, governed metrics, and operational tables where the native structure matters more than prose conversion.
Shape 04
Graphs
Relationships, dependencies, shared causes, and network structure where connections matter more than isolated records.
The Working Principle

The right retrieval unit must match the job. A chunk may be fine for an FAQ. A section tree may be right for a contract. A table may be right for financial work. A graph neighborhood may be right for dependency reasoning. A compiled brief may be right for repeated workflows.

02 Pinecone

From retrieval to compiled knowledge.

The most important signal in Pinecone's Nexus launch is not that it introduced one more retrieval product. It is that a company built on vector databases now says agents need something broader.

Pinecone's public Nexus product page now describes Nexus as a knowledge engine that compiles enterprise data into trusted knowledge and serves structured answers instead of ranked chunks. Pinecone's companion launch material makes the same argument more directly: the problem in agent systems is not only inference quality, it is the repeated cost of fetching, assembling, and reasoning over raw data at query time.

That is a material change in emphasis. Pinecone is not abandoning vector search. Its own launch language says vector primitives remain foundational. But Pinecone is clearly repositioning vector search as one primitive inside a broader memory layer. The new system compiles artifacts when source knowledge changes, then serves typed outputs to the agent when the task runs. That is much closer to operating context than classic similarity search.

KnowQL pushes the same idea at the query surface. Pinecone frames it as a shared vocabulary for structured, grounded knowledge in a single call. Pinecone's public design material shows query concepts such as scope, filters, grounded output with citations, and output-shape constraints. That is the right direction if the goal is to deliver bundles that match the job rather than dumping chunks into context and hoping the model can sort them out.

The key Pinecone signal is not that vector search stopped mattering. It is that Pinecone now treats vector search as one component of a bigger memory system. Executive Research . Pinecone Nexus

Pinecone also puts numbers on the table. Its Nexus product page claims 30x faster time-to-completion, task completion above 90 percent, and up to 90 percent lower token consumption than traditional agentic retrieval. Those are vendor claims, not neutral benchmarks, but they are still useful as a market signal. They show what the category itself now believes the problem is: too much effort is being spent on retrieval loops and too little on the actual task.

Read The Signal, Not Only The Product

Even if you never buy Nexus, the launch matters. It tells you that the vector-database category itself now accepts that agents need pre-assembled, permission-aware, grounded memory rather than repeated brute-force retrieval.

03 PageIndex

PageIndex and the wrong retrieval unit.

PageIndex pushes the argument further. Its public docs reject chunking for many long professional documents and instead preserve tree structure for reasoning over sections, subsections, and document hierarchy.

PageIndex describes itself as a vectorless, reasoning-based RAG framework. Its public documentation says it transforms documents into a tree-structured index, lets the model reason over that structure, and avoids both vector databases and chunking for long complex documents. Whether or not one adopts PageIndex itself, the underlying argument is strong: some documents lose meaning when broken into semantically similar chunks.

This matters most in contracts, policies, financial filings, compliance manuals, and technical documentation. A section title can change the meaning of a clause. Definitions can govern language many pages later. Schedules can override general terms. Tables are not interchangeable with surrounding prose. The retrieval unit should respect that reality.

Which retrieval unit fits which job

Work To Be Done
Better Retrieval Unit
FAQ or short answer: The answer usually lives in one or two nearby text regions.
Chunk or paragraph: Semantic retrieval is often good enough.
Contract review or filing analysis: Meaning depends on hierarchy, cross-references, and section roles.
Section tree: Preserve the structure of the document and reason through it.
Financial or operational analysis: The answer depends on columns, rows, and metric logic.
Table: Keep the tabular shape rather than flattening it into prose.
Support or account workflow: The work depends on multiple related fields across systems.
Record bundle: Assemble the case, policy, entitlement, and recent activity together.
Dependency or root-cause reasoning: The answer depends on relationships and paths.
Graph neighborhood: Retrieve entities and edges around the issue, not only text.
Repeated workflow: The same task needs the same evidence package every time.
Compiled brief: Build once when knowledge changes, then reuse.
What Better Embeddings Cannot Fix

Better embeddings only improve matching inside the retrieval format you already chose. They do not rescue a workflow built on the wrong unit of retrieval.

04 SAP

SAP's bet on tables, catalogs, and semantic layers.

SAP's recent acquisitions make the enterprise-memory argument concrete. Much of the knowledge that matters inside a business lives in governed tables, catalogs, and semantic layers, not only in PDFs and prose.

In May 2026, SAP announced plans to acquire both Dremio and Prior Labs. The two deals fit together cleanly. Dremio gives SAP a lakehouse and open catalog story across SAP and non-SAP data. Prior Labs gives SAP a stronger tabular-model story for reasoning over structured business data.

SAP + Dremio
May 4, 2026
SAP says Dremio will help turn SAP Business Data Cloud into an Apache Iceberg-native enterprise lakehouse with federated analytical reach across SAP and non-SAP systems. SAP also says the resulting catalog will act as both the discovery and semantic layer, carrying meaning, relationships, access rights, and lineage.
SAP + Prior Labs
May 4, 2026
SAP says large language models still struggle with tables, numbers, and statistics, while tabular foundation models are purpose-built for structured business data. SAP committed to invest more than EUR 1 billion over four years to scale Prior Labs into a frontier AI lab for structured data.

This is the clearest enterprise statement in the market. If the work is based on ERP, CRM, procurement, finance, or operations data, then the correct memory layer is often not a chunked document store. It is a governed data system with access control, lineage, semantic definitions, and reasoning that respects table structure.

The useful design rule is direct. Give the agent the knowledge in the same shape the business uses it. Documents when the source is document-based. Tables when the source is tabular. Metric definitions when governance matters. Workflow state when action depends on process state. Flattening everything into prose may simplify the pipeline, but it usually degrades the work.

05 Beyond Retrieval

Graphs, runtime state, and long-term memory.

The broader market now treats memory as more than a database query. Relationship memory, runtime state, and persistent memory across sessions are all becoming explicit product surfaces.

Microsoft GraphRAG
2024 to 2025
Microsoft Research describes GraphRAG as a structured, hierarchical approach to retrieval that builds knowledge graphs, community hierarchies, and summaries for complex question answering over private data. Microsoft's own follow-on work, including DRIFT Search and LazyGraphRAG, makes a second point just as clearly: graph memory is useful, but it is also expensive enough that cost and quality need active design.
Cloudflare Agents
Docs updated May 2026
Cloudflare now describes agents as stateful execution environments on Durable Objects. Each agent has durable storage, SQL state, WebSocket connections, and scheduling. Cloudflare's memory docs explicitly separate conversation history from context memory and stress that real agents should not reconstruct state from scratch on every request.
Google Cloud Memory Bank
May 2026 docs
Google's Gemini Enterprise Agent Platform now includes Memory Bank for long-term memories across sessions. The product docs emphasize memory extraction, consolidation, asynchronous generation, identity-scoped isolation, and managed persistent storage. That is a memory system, not only a wider context window.

These three examples matter because they describe different kinds of memory. Microsoft focuses on relational structure and global sensemaking across large corpora. Cloudflare focuses on agent runtime state and durable execution. Google focuses on long-term memories that persist across sessions and evolve over time. Together, they reinforce the same conclusion: memory is a multi-part architecture problem.

What The Market Is Really Saying

The industry is converging on a simple idea. Retrieval quality matters, but production agents also need durable state, long-term memory, relationship memory, permissions, provenance, and the right structure for each knowledge source.

06 Context Windows

Why bigger context windows are not the answer.

More context helps. It does not tell you what belongs in context, which source is authoritative, what the permission boundary is, or how to keep noise from degrading performance.

The most common shortcut in agent design is to assume that larger model context windows will eventually dissolve the retrieval problem. That is attractive because it seems to remove engineering decisions. In practice it only delays them.

A large context window does not decide which source is authoritative. It does not preserve document hierarchy. It does not tell the system whether a retrieved claim came from a governed table, a stale summary, or a previous model inference. It does not enforce permissions. And as more raw context is packed into the prompt, the model still has to spend attention budget sorting signal from noise.

Microsoft Research's later BenchmarkQED work is a useful check on the intuition that very long context makes retrieval design unnecessary. In that benchmark write-up, Microsoft reports strong LazyGraphRAG wins against competing methods, including a vector-based RAG configuration with a 1 million token context window. The implication is not that long context is useless. It is that long context by itself does not solve memory design.

The goal for production agents is not maximum context. It is appropriate context. Executive Research . Memory Design
Practical Test

If a system cannot explain why a specific record, clause, summary, or memory block belongs in context, then the extra context is not a solution. It is just more text.

07 Build Sequence

A practical build sequence for production agents.

The useful decision order is simple. Do not pick a database first. Start by defining what the agent must receive, in what form, to do its job reliably.

Step 1. Define the agent's contract with data.

Write down the work the agent must do and the evidence it must have to do it safely. Do not say "relevant context." Say exactly what fields, records, sections, tables, policies, and approvals are needed. This usually reveals that the agent depends on multiple systems and that different sources need different handling.

Step 2. Write down the bundle.

Translate the task into a bundle the system must deliver. A support bundle might include account metadata, entitlement status, recent tickets, outage flags, and refund policy excerpts. A legal bundle might include clause hierarchy, definitions, exception schedules, and approval history. A finance bundle might include governed metric definitions, source tables, variance thresholds, and report cutoffs.

Step 3. Choose the smallest set of primitives that delivers that bundle.

Use vector retrieval for fuzzy prose. Use document trees when hierarchy matters. Use semantic layers and tabular reasoning for governed business data. Use graph retrieval when relationship reasoning matters. Use durable memory or compiled briefs when the same workflow repeats. Most real agents need a mix. The discipline is not adding every layer. It is adding only the layers the job requires.

What To Avoid

Teams overbuild when they add a graph, a vector index, a semantic layer, and a memory store to a workflow that only needed one or two of those parts. Simple assistants often do not need a complex memory stack.

08 Diagnostics

What to measure in agent logs.

The cleanest way to see the memory problem is not from architecture diagrams. It is from actual traces. Work logs show whether the system is completing work or repeatedly rebuilding the same context.

Signal 01
Churn
How many retrieval calls happen before useful work begins?
Signal 02
Repeat
How often does the agent reopen the same source or reissue the same query?
Signal 03
Budget
How much token budget is spent rebuilding context rather than completing the task?
Signal 04
Memory
How often does one run rediscover what a previous run already learned?

The fifth question is usually the most revealing. How often does the agent ask for information the system already has? If that happens often, the memory layer is not actually serving the workflow. It is only storing data somewhere nearby.

What Good Looks Like

A good memory architecture reduces retrieval churn, shortens the path to useful work, lowers repeated token spend, and makes the agent's evidence package more consistent from run to run.

09 Conclusion

Start with the work, not the vendor.

The broader conclusion is straightforward. The memory era of AI infrastructure has arrived because production agents exposed the limits of systems built for chatbots. The core issue is not whether vector search was useful. It was. The issue is that production agents need richer memory systems than chatbots needed.

Pinecone now talks about compiled knowledge and single-call grounded answers. PageIndex argues that some important documents should never be chunked. SAP is investing in open catalogs, semantic layers, and tabular foundation models for the structured data that runs businesses. Microsoft keeps expanding GraphRAG because some work is fundamentally relational. Cloudflare treats durable state as native agent infrastructure. Google now exposes long-term memory as an explicit platform capability. All of these are different responses to the same underlying problem.

Teams that win this cycle will not be the ones chasing the trendiest retrieval tool. They will be the ones that design memory around the actual work their agents must do. Executive Research . Final Takeaway

The right sequence remains the same. Start with the work. Define the bundle. Respect the shape of the knowledge. Then choose the fewest primitives that deliver that bundle reliably. If a team starts with the vendor, the architecture usually ends up serving the tool. If it starts with the work, the architecture has a chance to serve the agent.

What Comes Next

If your agents keep rebuilding context, the memory layer is not finished.

This is the architecture conversation worth having before the next tool or vendor decision.

The cheapest fix is usually not a model change. It is a better definition of the bundle, a better choice of retrieval unit, and a stricter view of which knowledge source is authoritative for a given workflow.

Most teams do not need every memory primitive. They need a sharper contract between the task and the data. That is the review that prevents overbuilding on one side and repeated context churn on the other.

This brief is meant to give you a working framework for that review. If the current design still starts with a vendor slide, there is work left to do.

Can you describe the exact bundle your most important agent must receive before it can do useful work?

If the answer is vague, the infrastructure choice is still premature.
AI Infrastructure Agent Memory Retrieval Design Enterprise Data Production Agents
Start a Conversation →

No pitch. Just a practical discussion about the architecture you are trying to build.

The useful question is not which database is best. It is what shape of knowledge your agent needs.

chanderdhall.com . info@chanderdhall.com

10 Sources

Sources & references.

This brief is grounded in public product pages, documentation, and research posts available as of May 13, 2026. Product claims are identified as vendor claims where applicable.

  1. Pinecone Nexus product page, May 2026. Public description of Nexus as a knowledge engine for agents, including compile-on-change positioning, typed answers, field-level citations, and vendor-reported performance claims.
  2. Pinecone Nexus launch posts, May 2026. Public framing of context engineering, compiled artifacts, and the shift from chunk retrieval to knowledge compilation.
  3. Pinecone KnowQL public design material, May 2026. Public query-surface description including scope, filters, grounded output, and output-shape constraints.
  4. PageIndex developer documentation, March 2026, plus public GitHub repository. Public description of vectorless, reasoning-based retrieval over tree-structured document indexes without chunking.
  5. SAP News Center, May 4, 2026. SAP announcements for Dremio and Prior Labs, including Apache Iceberg-native lakehouse positioning, semantic layer and open catalog framing, and SAP's tabular-foundation-model strategy.
  6. Microsoft Research GraphRAG project and blog posts, 2024 to 2025. Public material covering GraphRAG, DRIFT Search, LazyGraphRAG, and BenchmarkQED for graph-based retrieval over complex private data.
  7. Cloudflare Agents and Durable Objects documentation, May 2026. Public docs describing stateful agents, persistent SQL-backed state, Session memory, and Durable Objects as the runtime substrate.
  8. Google Cloud Gemini Enterprise Agent Platform Memory Bank documentation, May 2026, plus Google Cloud ADK memory blog. Public docs describing long-term memories, extraction, consolidation, identity scope, and persistence across sessions.
  9. Chander Dhall Methodworks analysis, used as the framing thesis for this brief. Public sources above were used to validate product names, platform direction, and current public positioning.