Product · Coherent workspace

Apache-2.0 · single host · fail-closed denies · protocol TLA+-checked

Everything that touches your workspace trusts its own copy. One of those copies is stale.

Q: Do my files move into a database?

No. Your content stays on the real filesystem. CoherentVolume is an out-of-process coordinator client: it routes reads and writes through a local coordinator that holds only MESI state, a content hash, and a version per managed file. Point a sibling volume in another process at the same workspace and it attaches to the same coordinator, so a single-host fleet shares one coherent view.

Q: What happens when a write is denied?

The write raises a typed StaleView instead of landing — the newer version on disk survives. Recovery is explicit: reacquire() re-mints the caller's identity and forces a fresh read, after which the caller re-derives its change and writes again. A bare retry does not clear the denial; that is deliberate, so a naive loop cannot overwrite its way through.

Q: Can it stop a deploy from running on a stale config?

That is the effect-ordering gate. gate() captures the input's version when your code decides, re-reads at the effect boundary, and fires the effect only if the input is unchanged — otherwise it raises StaleView before the effect runs. It orders effects; it never rolls one back, and for an effect that escapes the coordinator there is a residual re-read-to-fire window it narrows but cannot close. For a pure write effect, write_cas_at is the atomic, no-window path.

A plans/ directory, a memory.json, a deploy config — shared by agents, parallel sessions, formatters, scripts, and you. Each one reads, holds a view, and writes back, and nothing checks whether the file moved in between. CoherentVolume is the data plane for that workspace: your bytes stay on the real filesystem; a local coordinator tracks who holds which version, denies a stale write fail-closed, and catches the edits that went around it.

The data-plane appliance: four calls, no framework.

Architecturally, CoherentVolume is an out-of-process coordinator client, not an in-process wrapper. It spawns (or attaches to) a local coordinator, and routes reads and writes through it. The coordinator holds only MESI state, a content hash, and a version per managed file — your live files stay on disk. Point a sibling volume in another process at the same workspace and it attaches to the same coordinator, so a single-host fleet shares one coherent view.

from ccs.adapters.coherent_volume import CoherentVolume

vol = CoherentVolume(workspace_root, managed=("plans/**", "memory/**"))
data = vol.read("plans/plan.md")              # bytes — registers a SHARED view
vol.write("plans/plan.md", revise(data))      # stale view? denied fail-closed
data = vol.reacquire("plans/plan.md")         # recover: re-mint identity + mandatory fresh read

The explicit read / write / reacquire / write_cas API is the supported primitive. write_cas(path, make_content) is the optimistic counterpart for concurrent same-key contention: exactly one writer wins, and the loser gets a typed conflict to re-derive from — never a silent drop. For code you'd rather not rewrite, an opt-in, demo-grade open() shim routes managed-path opens through the volume, so existing open() and pathlib calls get coherence unchanged:

from ccs.adapters.coherent_volume import coherent_workspace

with coherent_workspace(workspace_root, managed=("plans/**",)):
    text = open("plans/plan.md").read()       # registers a SHARED view
    open("plans/plan.md", "w").write(edit)     # stale view? raises out of close()

Run the failure and the fix yourself — offline, deterministic, no keys: python -m examples.coherent_volume.main. Source: agent-coherence/examples/coherent_volume.

Foreign-edit guards: the writes that bypass the coordinator.

Coordination covers writers that opt in. But real workspaces also get edited from outside: a human fixes a file in an editor, a formatter rewrites it, a script regenerates it. Without a guard, the next agent write silently buries that edit — and the next agent read silently builds on bytes the coordinator never saw. The volume guards both boundaries with a content-hash check:

Write boundary — on by default.

Before writing, the volume checks whether the managed file's on-disk bytes changed out-of-band since it last read or wrote them. If they did, the write raises StaleView instead of clobbering the foreign edit — recover with reacquire() (fresh read → re-derive → re-write). Opt out with on_stale_write="allow" to restore last-writer-wins.

Read boundary — opt-in.

With on_stale_read="raise", re-reading a managed file whose bytes changed out-of-band raises StaleView instead of returning bytes your other state wasn't computed from. A volume never denies its own just-written bytes — the benign commit→disk-write lag window is recognized and suppressed.

vol = CoherentVolume(workspace_root, managed=("plans/**",), on_stale_read="raise")
# a formatter rewrites plans/plan.md out-of-band …
vol.write("plans/plan.md", revised)     # StaleView — the foreign edit survives
fresh = vol.reacquire("plans/plan.md")  # recover: fresh read, re-derive, re-write

The honest boundary.

These are content-hash checks at the volume's read/write boundary — best-effort point-in-time detection, not filesystem interception. A write that never goes through the volume is caught at the next volume read or write of that file, not blocked as it happens. Proactively watching a source no agent touches is the source-watcher case: on the roadmap, demand-gated, not shipped today.

Effect-ordering gate: fire the deploy on the config it decided from.

Agents don't only overwrite files — they fire effects: a build, a deploy, a PR, a shell command, computed from inputs they read earlier. If the input moved in between, the effect fires on stale state. Two agents derive changes from the same shared state, both branches merge clean, both CI runs go green — and one intent silently vanishes. The isolation-everything answer (a worktree or sandbox per agent) catches textual conflicts at merge; it doesn't close this one.

gate() narrows the window: it captures the input's version at decision time, re-reads at the effect boundary, and fires only if the input is unchanged at that re-read — otherwise it raises StaleView and holds the effect before it runs.

from ccs.adapters import CoherentVolume, gate

vol = CoherentVolume(workspace_root, managed=("deploy/**",))

# fires run_deploy(plan) only if deploy/config.txt is unchanged since
# decide() read it; else raises StaleView before the deploy runs —
# reacquire() and re-decide.
gate(vol, "deploy/config.txt", decide=plan_deploy, effect=run_deploy)

It's plain Python, so the same call drops into a LangGraph node, a CrewAI task, a CI step, or a raw script unchanged. Run it: python -m examples.effect_gate.main (offline, deterministic, no keys) — or add --baseline to watch the stale fire it catches.

What the gate does and doesn't promise.

The gate orders effects; it never rolls one back. It fires pre-effect, so for an effect that escapes the coordinator — the deploy itself, the opened PR — there's a residual re-read→fire window it narrows but can't close. It's cooperative: the agent opts in. For a pure write effect, vol.write_cas_at(path, expected_version, content) is the atomic, no-window path — use it directly. Gating several mutually-consistent inputs at once is a snapshot-session operation, not this single-input wrapper.

Snapshot sessions: read five files as one consistent moment.

The gate protects one input. But an agent that reads several artifacts one by one — a plan, a config, a memory file — can see a torn combination: plan.md from before a peer's commit and config.json from after it. Every individual read was current; the set never coexisted. That's read-skew, and it survives any per-file freshness check.

A snapshot session closes that window: it pins a consistent cut of the artifacts you name, captured at a single point, and serves every session read from that cut while peers keep writing. The cut is an inspectable {artifact: version} map, not an opaque handle.

# Against a running coordinator (the same one CoherentVolume spawns), over HTTP:
POST /session/begin      {session_id, read_set: ["plans/plan.md", "config/app.json"]}
                         → {session_token, cut: {path: version}, …}
POST /session/read       {session_id, session_token, path}
                         → the artifact at its PINNED version — never a newer one
POST /session/commit     {session_id, session_token, path, content}
                         → wins only if no peer moved the artifact since the cut
POST /session/heartbeat  {session_id, session_token}   # keep the session's lease alive

Or in-process: CoordinatorService.begin_session(read_set=…, owner=…) → session_read(…) / session_commit(…).

Fail-closed by construction: reading an artifact that was not in the pinned read-set is refused with a typed rejection — never silently served from live state. A session whose heartbeat lapses, or that is lost to a coordinator restart, is invalidated: later reads get a typed "session invalidated" rejection telling the agent to re-establish, never a quiet fall-through to whatever is current.
Stable reads without serializing the fleet: the agent can compute for ten seconds, read the plan again, and get the same version it started with — without holding a write lock for the duration. Commits validate against the pinned base through the same optimistic CAS as write_cas. Model-checked: NoReadSkewWithinCut and PinAlwaysRetained (Snapshot.tla).

Read-skew, not write-skew.

Sessions prevent torn reads across artifacts. They do not add write-skew prevention: two sessions that read one cut and write different artifacts can still interleave — commits validate per-artifact against the pinned base. Atomic multi-artifact publish is on the roadmap, demand-gated. And a long session maximizes abort probability at commit: the honest pitch is "consistent reads plus safe commit-validation," not "reason for two hours and reliably commit." One serving detail: when the coordinator retains version bodies it serves the pinned bytes directly; otherwise it returns the pinned version and content hash as a typed signal, and the caller fetches the bytes from its own data plane.

What's actually checked, invariant by invariant.

Each row is a safety invariant model-checked with TLA+/TLC. make tla-check runs the specs in CI on every push, and every spec carries a documented mutant that must fail — the invariants are load-bearing, not decorative.

The silent failure	What happens instead	Invariant
Stale-read overwrite — an agent acts on an old snapshot and writes over a newer version	The write is denied fail-closed; the writer must `reacquire()` and read the current version	`SingleWriter`, `MonotonicVersion`
Concurrent lost update — two writers hit the same key and both "succeed"	Exactly one wins; the loser gets a typed conflict + bounded retry, never a silent drop	`NoLostUpdate`
Reclaim-zombie write — a stalled writer is reclaimed by crash recovery, wakes later, and lands its stale commit	The commit is rejected with a typed `stale_read_generation` conflict (the read-generation fence)	`NoStaleApply`
Torn multi-artifact read — each read was individually current, but the combination never coexisted	Session reads serve from a pinned consistent cut; a lapsed session fails closed with a typed rejection	`NoReadSkewWithinCut`, `PinAlwaysRetained`
Dead owner blocks the fleet — a crashed agent holds EXCLUSIVE forever	The heartbeat/TTL sweep reclaims the grant (on by default)	sweep invariants I3–I6

The foreign-edit guards are the one exception: they're content-hash checks over disk bytes, enforced by tests rather than TLA+ — the model-checked invariants cover the protocol state machine, not the filesystem. Specs, the invariant ↔ implementation map, and the mutant recipes: formal/tla/.

What this doesn't claim.

Claiming less than the library does is the cheaper credibility trade. The honest scope is narrower than the headline, and worth stating plainly.

Single host, one coordinator. The guarantees hold for writers that go through the coordinator, under a single coordinator on one host. Concurrent same-key writers on that host are covered; cross-host coordination is on the roadmap, demand-gated.
Cooperative, not interception. The volume guards reads and writes routed through it. Edits made around it are caught at the boundary — at the next volume access of that file — by the foreign-edit guards, not blocked as they happen.
It does not catch a writer that ignores what it read. An agent that re-reads fresh bytes and then writes a buffer computed from older ones defeats any optimistic-concurrency layer. The contract is: write from what read() / reacquire() returned.
The open() shim is convenience, not the contract. It covers open() / pathlib text and binary read/write — not raw os.open, subprocess redirection, mmap, or append/update modes; those delegate to the original open() unchanged. The explicit API is the supported surface.
Effects are ordered, never rolled back. gate() fires pre-effect and never undoes one. Sessions prevent read-skew, not write-skew. Atomic multi-artifact publish is roadmap, demand-gated.

Frequently asked questions.

Common questions about the coherent workspace — CoherentVolume, foreign-edit guards, the effect gate, and snapshot sessions.

Do my files move into a database?

No. Your content stays on the real filesystem. CoherentVolume is an out-of-process coordinator client: it routes reads and writes through a local coordinator that holds only MESI state, a content hash, and a version per managed file. Point a sibling volume in another process at the same workspace and it attaches to the same coordinator, so a single-host fleet shares one coherent view.

What happens when a write is denied?

The write raises a typed StaleView instead of landing — the newer version on disk survives. Recovery is explicit: reacquire() re-mints the caller's identity and forces a fresh read, after which the caller re-derives its change and writes again. A bare retry does not clear the denial; that is deliberate, so a naive loop cannot overwrite its way through.

What if a human or a formatter edits a managed file directly?

That is the foreign-edit case, and it is guarded at both boundaries with a content-hash check. On write (on by default): if the on-disk bytes changed out-of-band since the volume last saw them, the write raises StaleView instead of clobbering the foreign edit. On read (opt-in via on_stale_read="raise"): re-reading a file whose bytes changed out-of-band raises instead of returning bytes your other state wasn't computed from. This is point-in-time detection at the volume's boundary, not filesystem interception — a write that never goes through the volume is caught at the next volume access of that file, not blocked as it happens.

Can it stop a deploy from running on a stale config?

That is the effect-ordering gate. gate() captures the input's version when your code decides, re-reads at the effect boundary, and fires the effect only if the input is unchanged — otherwise it raises StaleView before the effect runs. It orders effects; it never rolls one back, and for an effect that escapes the coordinator there is a residual re-read-to-fire window it narrows but cannot close. For a pure write effect, write_cas_at is the atomic, no-window path.

Can an agent read several files as one consistent snapshot?

Yes — multi-artifact snapshot sessions (v0.11.0). A session pins a consistent cut of the artifacts you name, captured at a single point, and serves every session read from that cut while peers keep writing. Reading an artifact outside the pinned set is refused with a typed rejection, never silently served from live state, and a lapsed or lost session fails closed. This prevents read-skew (torn reads across artifacts); it does not add write-skew prevention — commits validate per-artifact against the pinned base.

Does it work across machines?

The guarantees hold for a single-host fleet under one coordinator. Concurrent same-key writers on that host are covered by optimistic commit-CAS. Coordinating writers across multiple hosts is on the roadmap, demand-gated — if you need it, open an issue on the repo.

Agents, sessions, and scripts sharing one workspace?

15-minute call. Walk me through what reads and writes your shared files — agents, CI steps, humans — and I'll tell you which of these guards applies, and where you'd hit the single-host boundary that's still on the roadmap.

Request an AI summary of agent-coherence