Consistency layer · RAG & shared memory

Apache-2.0 · single host · sequential lost-update guarantee · TLA+-checked

Shared agent memory reintroduced the lost update. RAG didn't cause it. It widened the surface.

Q: Does it work with LangGraph memory?

Yes, for the read side. CCSStore is a drop-in for langgraph.store — store.get(), store.put(), and store.search() keep working unchanged. A peer's commit invalidates your cached view, so your next get is a fresh miss instead of a stale hit. put is not version-CAS, so CCSStore does not deny a stale write-back — for write-side lost-update prevention, route those writes through CoherentVolume or write_cas. The swap is one import change at the place you construct the store; node code using the standard get/put signatures runs unchanged.

Retrieval and agent memory are shared mutable state. Two agents read the same record at version 1. One writes version 2. The other, still holding the version 1 it read minutes ago, writes its edit back and erases version 2. Last write wins, the update is silently gone, nothing errors, and every downstream answer builds on a version that no longer exists. agent-coherence turns that silent clobber into a typed refusal: a consistency layer that sits under your store, not another vector DB.

A consistent store doesn't save you.

The first instinct is to reach for a stronger store: a transactional vector DB, an ACID memory backend. It doesn't help, because the stale copy isn't in the store. It's in the agent's view.

Two agents read the same record.

A retrieval result, a memory.json entry, a plan.md, a store key. Agent A and agent B — or an agent and the nightly ingestion pipeline that writes the same memory — both read it at version 1 and start working from that snapshot.

A writes. B is now holding the past.

A finishes, writes version 2, and releases. The store is perfectly consistent. It has v2. But B is still operating on the v1 it read minutes ago. There was no real concurrency. B's world simply moved on without it.

B writes its v1-derived edit. v2 is gone.

B computes its change from v1 and writes it back, overwriting v2. The store did nothing wrong. No exception fired. A's work is simply erased, and the next retrieval returns B's version as if A's had never existed. This is the lost update.

This is the exact shape CPUs faced in the 1980s and solved with cache coherence: every core can hold a shared line, but the instant one core writes, the others' copies are invalidated before they can read or write back a stale value. The backing memory was never the problem. Keeping the readers honest was.

The fix is on the readers, not the store.

So the missing layer isn't a better database. It's something that tracks which agent holds which record at which version, and invalidates a reader's view the instant a peer writes. The stale writer is stopped and made to re-read before it can act.

That's what agent-coherence is: a small MESI-derived protocol with single-writer / multiple-reader per record, monotonic versions, and a cheap (~12-token) invalidation signal instead of rebroadcasting the whole record every turn. In the lost-update case, concretely:

Without coherence: B holds a stale v1, writes, and silently erases v2. You find out, if you ever do, when a downstream answer is wrong and the audit trail has no error in it.
With coherence: when A commits v2, B drops to INVALID. B's next write is denied fail-closed. B must call reacquire(), which re-mints its identity and forces a fresh read, before it can write again. A bare retry doesn't clear the denial. That's deliberate, so a naive loop can't overwrite its way through. Both updates survive.

Why "denied," not "warned."

The refusal is load-bearing. The deny is single-writer by invalidation, not a mutex you hold. The safety properties behind it (SingleWriter, MonotonicVersion, NoLostUpdate, NoStaleApply) are machine-checked in TLA+, with make tla-check running all six CI specs on every push, each carrying a documented mutant that must fail. The guarantee is structural, not a prompt asking the model to be careful.

One import, on the store you already use.

If you're on a LangGraph store, the layer your RAG and memory reads and writes flow through, CCSStore is a drop-in. store.get(), store.put(), and store.search() keep working unchanged. It adds read-side coherence: a peer's commit invalidates your cached view, so your next read is a fresh miss instead of a stale hit. It does not deny a stale write-back — put is not version-CAS. For write-side lost-update prevention, route those writes through CoherentVolume or write_cas below. One coordinator, one process: two processes each constructing a CCSStore share nothing.

# Before
from langgraph.store.memory import InMemoryStore
store = InMemoryStore()

# After — one import change, no node code changes
from ccs.adapters import CCSStore
store = CCSStore(strategy="lazy")

If your shared memory is plain files across processes or sessions, say a memory.json, a LEARNINGS.md, or a scratchpad every run reads and writes, then CoherentVolume wraps them with the same protocol, no framework required.

from ccs.adapters.coherent_volume import CoherentVolume

vol = CoherentVolume(workspace_root, managed=("memory/**",))
record = vol.read("memory/learnings.md")     # tracked read — your view is registered
vol.write("memory/learnings.md", revised)    # stale view? denied fail-closed → vol.reacquire() and re-derive

Same protocol underneath either surface, and the same behavior regardless of which model provider (Anthropic, OpenAI, Google, Mistral, open-source) the agents talk to. Already on CrewAI, AutoGen, the OpenAI Agents SDK (that adapter is experimental), or a custom loop? The library exposes the same coordinator through each of those seams.

See it run: deterministic, no keys.

Three runnable demos reproduce the failure and then prevent it. All are offline and deterministic, with no API keys and no spend.

A lost update in a shared knowledge base.

Two agents curate one RAG corpus. One overwrites the other's newer entry from a stale read — then CoherentVolume denies the stale overwrite and both updates survive.
python -m examples.shared_knowledge_base.demo

Conversation memory goes stale across turns.

Two agents share one conversation. One caches it, the other revises it, and the first acts on a stale copy. CoherenceAdapterCore invalidates the stale cache so the reader re-fetches before it acts.
python -m examples.conversations_stale_read.main

The shared-file lost update, reproduced and prevented.

A faithful offline reproduction of a documented data-loss event (one shared file, sequential runs, earlier content destroyed on overwrite) paired with the coherence layer that catches the stale write before it lands.
python -m examples.coherent_volume.main

Animated diagram of the shared-record lost update and its prevention. Both agents read a shared total of 100. Agent A adds 10 and writes 110. Without coherence, Agent B writes 105 from its stale read of 100 and silently clobbers A — final 105, expected 115. With CoherentVolume, B's stale write is denied fail-closed; B reacquires the current 110 and writes 115. Both updates survive.

Animated diagram of the examples/coherent_volume demo. Run it yourself — offline, deterministic, no keys: python -m examples.coherent_volume.main. Source: agent-coherence/examples/coherent_volume.

The honest framing the demos ship with.

The conversation demo measured the providers first: a 100-trial OpenAI + 20-trial Mistral probe observed zero stale reads from the Conversations APIs, since both commit a write before returning the ACK. The APIs are not the bug. The client cache is. Agents cache state locally to avoid re-paying for the whole history, and that local copy goes stale the moment a peer writes, regardless of how consistent the server is. Coherence is about the readers. We don't claim those APIs serve stale reads, because in our trials they didn't.

pip install "agent-coherence[langgraph]" gets the adapters (Python 3.11+). The runnable demos live in the repo: github.com/Cohexa-ai/agent-coherence/tree/main/examples.

Where it sits: under your memory stack, not in front of it.

agent-coherence is a consistency layer (does your view match the current version?), not a retrieval layer (which chunk is most relevant?). It partners underneath your memory and RAG vendors. It never tries to become one.

Layer	Owns	Examples
Retrieval / memory	Embedding, ranking, storage, recall	Mem0, Letta, LlamaIndex, your vector DB, a LangGraph store
Coherence (this)	Whose cached view is current — read-side invalidation of stale readers (`CCSStore`), write-side deny before a stale write lands (`CoherentVolume`, `write_cas`)	`CCSStore`, `CoherentVolume`, `CoherenceAdapterCore`
Orchestration	Who runs, in what order, with what tools	LangGraph, CrewAI, AutoGen, custom loops

It doesn't embed, rank, or store vectors. It adds the one thing the retrieval layer doesn't give you: a guarantee that the version your agent is about to write over is the version it actually read. On the write path that guarantee is enforced by CoherentVolume and write_cas; on a LangGraph store, CCSStore keeps the readers fresh.

What this doesn't claim.

Claiming less than the library does is the cheaper credibility trade. The honest scope is narrower than the headline, and worth stating plainly.

The clean guarantee is sequential and single-host. The headline case (one agent reads, a peer writes, the first writes back stale) is denied fail-closed under a single coordinator. Concurrent same-key writers on that host are also covered (optimistic commit-CAS: exactly one wins, the loser gets a typed conflict with bounded retry). Cross-host fencing is on the roadmap, demand-gated.
CCSStore coherence lives inside one Python process. A separate process — a nightly consolidation job, a second service — gets nothing from it. For cross-process on one host, put the shared state behind CoherentVolume or the stale-write-guard-fs MCP server on a shared root.
The deny is invalidation, not a mutex. There's no lock you hold across a critical section. A stale write is refused. reacquire() re-mints identity and forces a fresh read, and a bare retry does not clear the denial. That's the whole mechanism: single-writer by invalidation.
Foreign edits to a managed file are caught at the next access, not by a background watcher. If a write goes through the coordinator (a peer agent, another session, your store via CCSStore or CoherentVolume) the stale view is caught. As of v0.10.0, a managed file hand-edited out of band — by a user, a script, or a tool not on the coordinator — is caught too, the moment an agent re-reads it (denied in strict mode, opt-in on_stale_read="raise") or writes over it (denied by default, on_stale_write="raise"); both raise StaleView, cleared by reacquire(). Proactively watching a source no agent reads or writes is still the source-watcher case: on the roadmap, not shipped today.
It isn't your memory or RAG store. No embeddings, no ranking, no vectors. It's the consistency layer underneath whatever you already use to retrieve and remember.
You may not need it. Read-only RAG, a single writer, per-user namespaces, or an append-only / event-sourced memory store have no lost update to lose — a reader always sees a whole version. If that's your write model, isolation already solves this and you don't need a coherence layer.
Multi-record reads and publishes are covered, with narrow claims. Snapshot sessions (v0.11.0) pin a consistent cut so an agent reading several memory records never acts on a torn mix — that's read-skew prevention, not write-skew. atomic_publish (v0.12.0) commits a set of files all-or-nothing at the coordinator — publish atomicity, not rollback of effects that already escaped. Both single-host.
No cost number on this page, on purpose. Correctness is the wedge here. The temporal change-rate cost benchmark — how many re-fetches coherence-gating avoids as a source drifts between turns — is a separate, pre-registered measurement, kept in the repo rather than spliced into a number on this page.

Frequently asked questions.

Common questions about coherence for RAG corpora and shared agent memory.

Does agent-coherence replace my vector database or memory store?

No. It is a consistency layer that sits underneath your retrieval and memory store. It does not embed, rank, or store vectors. It tracks which agent holds which record at which version and invalidates a reader's cached view when a peer writes, so a stale view routed through it cannot silently overwrite a newer one. Your existing store stays the durability and recall layer. On a LangGraph store the routing is the CCSStore drop-in (read-side). Alongside Mem0, Letta, or LlamaIndex it guards the shared files and views your agents read and write around them via CoherentVolume — it does not wrap those vendors' internals.

My vector store is already consistent. Why do I need this?

A consistent store does not save you, because the staleness is not in the store — it is in the agent's cached copy. Two agents read a record at version 1; one writes version 2; the other, still holding the version-1 it read minutes ago, writes its edit computed from version 1 and clobbers version 2. The store did nothing wrong. Coherence keeps the readers honest: the moment one writes, the others' copies are invalidated before they can write a stale value back.

Does it work with LangGraph memory?

Yes, for the read side. CCSStore is a drop-in for langgraph.store — store.get(), store.put(), and store.search() keep working unchanged. A peer's commit invalidates your cached view, so your next get is a fresh miss instead of a stale hit. put is not version-CAS, so CCSStore does not deny a stale write-back — for write-side lost-update prevention, route those writes through CoherentVolume or write_cas. The swap is one import change at the place you construct the store; node code using the standard get/put signatures runs unchanged.

Does it detect when an external file or index changes out of band?

For writes that go through the coordinator — a peer agent, another session, or your retrieval layer wired through CCSStore or CoherentVolume — yes. As of v0.10.0, a managed file edited out of band (a human hand-editing the raw file, a script, or a tool not on the coordinator) is also caught at the next access: an agent re-reading it is denied in strict mode (opt-in via CoherentVolume(on_stale_read="raise")), and an agent writing over it is denied by default (on_stale_write="raise"). Both surface as StaleView and recover via reacquire(). That is detection at the read/write boundary, not a background watcher; proactively watching a source no agent touches is still on the roadmap, demand-gated.

Does it handle concurrent writers?

On a single host, yes: concurrent same-key writers are serialized by optimistic commit-CAS — exactly one wins and the loser gets a typed conflict with bounded retry, never a silent drop. That path is write_cas / CoherentVolume — CCSStore's put does not do version-CAS. Cross-host fencing is on the roadmap, demand-gated. The clean, headline guarantee is the sequential stale-read-then-write lost update.

Running multi-agent RAG or shared agent memory?

15-minute call. Walk me through your store and memory write-back loop, and I'll tell you where the lost-update surface is and whether the shipped single-host guarantee covers it. If you're in the cross-host case, that's an active co-design — we'll scope building it against your substrate together, not pretend it ships today.

Request an AI summary of agent-coherence