Good for narrow answers
Semantic chunks still work for FAQ-style questions where the answer sits in one or two nearby passages.
Classic vector search solved a narrower chatbot problem. Production agents need richer memory systems that deliver the right shape of knowledge with provenance, permissions, and enough structure to complete multi-step work.
The vendor list keeps changing. The underlying shapes do not: prose, documents, tables, and graphs.
Wiki pages, notes, and fuzzy narrative text where semantic retrieval still works well.
Contracts, filings, and policies where hierarchy and section role carry meaning.
ERP, CRM, metrics, and governed business data where the tabular form matters.
Relationships and dependencies where paths and neighbors matter more than isolated records.
Classic RAG was good at narrow questions. Production agents need customer records, policies, tables, workflow state, and access rules together before they can do useful work.
Semantic chunks still work for FAQ-style questions where the answer sits in one or two nearby passages.
Multi-step work needs assembled records, policies, metrics, and workflow context in one evidence package.
The right first question is what form of knowledge the task requires, not which retrieval engine is fashionable.
Nexus is positioned as a knowledge engine that compiles knowledge once and serves structured answers later. KnowQL adds a query vocabulary for scope, grounding, filters, and output shape.
Fetching, assembling, and structuring happen when knowledge changes, not on every agent call.
Typed responses with citations are closer to operating context than ranked chunks.
Pinecone reports large gains in speed and token cost, but the bigger takeaway is the direction of the architecture.
PageIndex's public argument is simple: some professional documents lose meaning when flattened into chunks. The retrieval unit must match the work.
| Work to be done | Wrong default | Better unit |
|---|---|---|
| FAQ and short answer | Overengineer with structure the task does not need | Chunk or paragraph |
| Contract or filing analysis | Flatten sections into interchangeable text | Section tree with hierarchy |
| Financial or operational analysis | Convert tables into prose and hope the model reconstructs them | Native table or governed metric view |
| Dependency or root-cause reasoning | Retrieve disconnected text and ask the model to guess the path | Graph neighborhood |
| Repeated workflow | Rebuild the same context every run | Compiled brief or reusable bundle |
The Dremio and Prior Labs acquisitions point to lakehouse architecture, semantic layers, lineage, access control, and tabular models as first-class memory primitives for agents.
SAP positions Dremio as the open foundation for SAP and non-SAP data with federated analytical reach.
SAP says the shared catalog will carry meaning, relationships, access rights, and lineage as part of the memory layer.
SAP committed more than EUR 1 billion over four years to scale Prior Labs around structured data and tabular foundation models.
The market is also making runtime state, long-term memory, and relationship memory explicit. That expands the design space beyond search.
GraphRAG builds knowledge graphs, community hierarchies, and summaries for complex reasoning over private data. Later work like DRIFT and LazyGraphRAG focuses on cost and query quality.
Cloudflare treats agents as stateful Durable Objects with SQL state, scheduling, and memory APIs so the agent does not reconstruct itself on every request.
Google's platform now exposes memory extraction, consolidation, asynchronous generation, and identity-scoped persistence across sessions.
More context helps. It still does not decide what belongs in context, which source is authoritative, or how to prevent noise from overwhelming the model.
A large window does not tell the agent which source is the source of truth.
It does not enforce access policy or identity-scoped memory boundaries.
It does not preserve section structure, table logic, or governed metric meaning by itself.
More raw context can still produce slower, noisier, and less reliable task performance.
The highest-leverage move is to define the agent's contract with data. Once the bundle is explicit, the memory stack usually becomes much clearer and much smaller.
Read the full brief →Write down the work the agent must do and the exact evidence it must receive to do that work reliably.
Name the required records, clauses, tables, metrics, workflow states, and policy fragments instead of asking for generic relevant context.
Use the fewest memory primitives that can deliver that bundle: vector retrieval, document trees, semantic layers, graphs, or compiled briefs.
If a team wants to know whether its memory architecture is working, it should inspect traces before it draws more boxes.
| Log question | What it reveals |
|---|---|
| How many retrieval calls happen before useful work begins? | Whether the agent is solving the task or only warming up its context. |
| How often does the agent reopen the same sources? | Whether knowledge is being reused or rediscovered every run. |
| How much token budget is spent rebuilding raw context? | Whether the memory layer is reducing cost or merely moving it around. |
| How often does the agent ask for information the system already has? | Whether memory is actually operational or just stored nearby. |
| How often does one run rediscover what a previous run learned? | Whether the system has durable memory or expensive amnesia. |
These are the public sources that anchor the brief. Product-performance numbers are treated as vendor claims unless independently benchmarked.
Full references appear in the review version of the report.
Teams that win this cycle will not be the ones chasing the trendiest retrieval tool. They will be the ones that define the bundle clearly, respect the shape of the knowledge, and use the fewest primitives needed to deliver it.
© 2026 Chander Dhall Methodworks, LLC. All rights reserved.