How much does agent-coherence save on token costs, and on what workloads?

69% token reduction on read-heavy workloads (12:1 read/write ratio), 47% on moderate workloads (8:3), 29% on write-heavy workloads (8:4). The mechanism is replacing full-artifact rebroadcasts with ~12-token invalidation signals. Benchmarks reproduce in CI without live LLM API calls: pip install "agent-coherence[langgraph,benchmark]" then make benchmark.

State consistency · Write-side coherence for collaborating agents

v0.7.1 · Apache-2.0 · PyPI

When agents share state, one of them is reading a stale copy.

Q: What is a stale read — and how does it cause shared memory pollution?

When one agent reads an artifact (a plan, a document, a result) that another agent has already updated, the reader gets a stale copy. If the reader then writes back, it overwrites the current version with logic that was based on stale state. MLflow's multi-agent observability team calls this shared memory pollution: one agent's hallucination becomes a 'fact' subsequent agents reason from, producing cascading errors that compound across reasoning steps. Trace-only tools can see the calls but not the staleness; agent-coherence detects the exact moment of divergence.

Q: Do I have to switch frameworks or model providers to use agent-coherence?

No. Drop-in adapters ship for LangGraph (CCSStore), CrewAI, AutoGen, and any custom orchestrator via CoherenceAdapterCore. The protocol operates on artifacts, not model responses, so behavior is identical with Anthropic, OpenAI, Google, Mistral, AWS Bedrock, Azure OpenAI, and open-source models.

That stale read is how shared memory pollution starts — one agent's hallucination becomes a "fact" the next one reasons from, and a cascading error propagates downstream. agent-coherence makes the moment visible and serves the current version on the next read instead of rebroadcasting the full artifact every turn. Same library, same protocol, across LangGraph, CrewAI, AutoGen, and any custom orchestrator. Same behavior regardless of model provider (Anthropic, OpenAI, Google, Mistral, open-source).

"This asynchronicity adds challenges in result coordination, state consistency, and error propagation across the subagents."

— Anthropic Engineering, How we built our multi-agent research system (June 2025), on what blocks async multi-agent execution at scale.

Anthropic named the problem. agent-coherence is the protocol that addresses it.

$ pip install "agent-coherence[langgraph]"

Book a 15-min call → View on GitHub Read the paper

69%

token reduction on read-heavy workloads

~12

tokens per invalidation, vs full artifact rebroadcast

$18K

/year saved on one 1,000-runs/day code-review workload

Two coherence problems. agent-coherence solves the write side.

If your agents only read from sources you don't control, you need a freshness pipeline. If your agents write to each other's state, you need a coherence protocol. They're different problems — and the wrong tool for one is silent failure in the other.

Read-side freshness

Agents downstream of an external source.

The world writes (commits, Slack, docs, tickets); agents read. You need an index pipeline that keeps the corpus current as sources change — incremental embeddings, knowledge graphs, retrieval.

Examples: RAG over a codebase, search over a meeting archive, knowledge-graph extraction from a doc store.

Write-side coherence

Agents are themselves the source of truth.

The agents write — they collaborate on shared plans, edit specs, mutate memory, hand off scratchpads. You need a coherence protocol that detects stale reads and enforces single-writer ordering when one agent commits.

Failure modes caught: stale reads · shared memory pollution · cascading errors · context handoff drift · concurrent-write conflicts.

Examples: multi-agent planners, parallel sub-agents editing a spec, coding agents collaborating on a refactor, research crews mutating shared notes.

Both layers are needed in a real production system. agent-coherence focuses on the write side.

Workload	Agents	Reads : Writes	Hit rate	Savings
Planning (read-heavy)	4	12:1	75%	69%
Code review (moderate)	3	8:3	60%	47%
High-churn (write-heavy)	4	8:4	50%	29%

How it works.

MESI cache coherence — the protocol every modern CPU uses to share memory — adapted for LLM agents sharing artifacts.

Local cache per agent

Each shared artifact is cached locally per agent. Reads serve from the local cache when valid — no re-broadcast.

Lightweight invalidation

Writes commit to a coordinator, which sends ~12-token invalidation signals instead of rebroadcasting the full artifact.

Bounded staleness

Single-writer-multiple-reader per artifact with bounded staleness. Peers re-fetch on next read, guaranteed.

Five synchronization strategies ship out of the box: lazy (default), eager, lease (TTL-based), access_count, and broadcast — pick the one matching your workload's read/write ratio and staleness tolerance.

Works with the stack you already have.

Same library, same protocol, same behavior — regardless of orchestrator or model provider.

LangGraph

Drop-in CCSStore — one import change, no node code changes

CrewAI

CrewAIAdapter(strategy_name="lazy")

AutoGen

AutoGenAdapter(strategy_name="lazy")

Custom orchestrators

Framework-agnostic CoherenceAdapterCore

# LangGraph drop-in — one import change, no node code changes
from langgraph.store.memory import InMemoryStore  # before
from ccs.adapters import CCSStore                  # after

store = CCSStore(strategy="lazy")
graph = builder.compile(store=store)

"Subagent output to a filesystem to minimize the 'game of telephone' [...] implement artifact systems where specialized agents can create outputs that persist independently."

— Anthropic Engineering, multi-agent research system (Appendix, June 2025). CCSStore is exactly that pattern — plus coherence semantics so subagents know when their cached view is stale.

Provider-neutral: same behavior with Anthropic, OpenAI, Google, Mistral, or open-source models. The protocol operates on artifacts, not model responses.

Building coding sub-agents?

See the recorded planner-executor demo →

Real tsc on a real TypeScript refactor · three variants (with-coherence, no-invalidation, context-cache) · op-log + tsc result in the GIFs.

Where this fits in the agentic stack.

Anthropic's engineering team, after shipping their multi-agent Research system to production, named state consistency as one of three challenges blocking async multi-agent execution at scale. agent-coherence is the protocol that addresses it.

Architecturally, this is the layer QuantumBlack/McKinsey describes as agentic shared services — the protocol-first, composable substrate between agent runtimes and enterprise data. agent-coherence is the state-consistency primitive that lives there.

Agentic systems & runtimes

Where the agents actually execute.

MS AI Foundry · Google Vertex · AWS Bedrock · LangGraph · CrewAI · AutoGen · Ark · Kagent · custom orchestrators.

Interfaces & agentic orchestration

How agents talk to each other and the world.

A2A · MCP · planner / supervisor patterns · tool routing.

Agentic shared servicesagent-coherence is here

State consistency · coherence protocol · single-writer ordering.

Co-resident with: agentic evaluation · observability · memory management · feedback loops · security & protocol standards. Composable and protocol-first — drop into existing orchestration without rewriting agent code.

In-house systems & external data

Sources the agents read from.

Knowledge graphs, RAG corpora, vector stores, read-side freshness pipelines, ticketing systems, code repos.

Layer naming follows "Creating a future-proof enterprise agentic platform architecture" (QuantumBlack/McKinsey). agent-coherence is composable by design: it slots alongside your existing evaluations, observability, and memory layers — same library across LangGraph, CrewAI, AutoGen, and custom runtimes, vendor-neutral across Anthropic, OpenAI, Google, Mistral, and open-source models. Multi-vendor workflows, minimum lock-in.

The audience signal is consistent: 32% of agent teams cite quality — "hallucinations and consistency of outputs" — as the #1 production blocker. LangChain, State of Agent Engineering 2026.

Frequently asked questions.

Common questions about stale-read detection and multi-agent coherence across LangGraph, CrewAI, AutoGen, and custom orchestrators.

What is a stale read — and how does it cause shared memory pollution?

When one agent reads an artifact — a plan, a document, a result — that another agent has already updated, the reader gets a stale copy. If the reader then writes back, it overwrites the current version with logic that was based on stale state. MLflow's multi-agent observability team calls this shared memory pollution: one agent's hallucination becomes a "fact" subsequent agents reason from, producing cascading errors that compound across reasoning steps. Trace-only tools can see the calls but not the staleness; agent-coherence detects the exact moment of divergence.

How is this different from LangSmith or Braintrust?

LangSmith and Braintrust show you what your agents did. agent-coherence shows you when one of them was wrong because it read stale state from another. The difference is structural — we track per-agent ownership of shared artifacts (MESI states), so the tool can flag a read that returned an outdated copy. Trace-only tools cannot detect this because they lack the state model.

Do I have to switch frameworks or model providers?

No. Drop-in adapters ship for LangGraph (CCSStore), CrewAI, AutoGen, and any custom orchestrator via CoherenceAdapterCore. The protocol operates on artifacts, not model responses, so it works the same with Anthropic, OpenAI, Google, Mistral, AWS Bedrock, Azure OpenAI, and open-source models.

What happens when two agents try to write to the same artifact?

The protocol enforces single-writer exclusivity. Only one agent holds write permission at a time; concurrent writes are prevented at the protocol level, not "resolved" by an append-only reducer that produces duplicates or unexpected list nesting. For workloads where concurrent writes are semantically composable, CRDTs are the right tool — see the Why Coherence Matters doc for the layered model.

How much does it actually save, and on what kind of workload?

69% token reduction on read-heavy workloads (12:1 read/write ratio), 47% on moderate (8:3), 29% on write-heavy (8:4). The lever is invalidation signals (~12 tokens) replacing full-artifact rebroadcasts. Run the benchmarks yourself: pip install "agent-coherence[langgraph,benchmark]" then make benchmark. CI uses GenericFakeChatModel — no live API calls required.

Is it production-ready?

Apache-2.0, on PyPI, 165 tests, TLA+/TLC model-checked safety properties (single-writer, monotonic versioning, no torn reads), PyPI Trusted Publishers with PEP 740 attestations and CycloneDX SBOM published with every release. Opt-in crash-recovery sweep reclaims stale grants when agents OOM-kill or livelock. ccs-diagnose runs as a zero-network static analyzer on existing graphs before adoption.