W
Product · Coherent workspace
Apache-2.0 · single host · fail-closed denies · protocol TLA+-checkedA plans/ directory, a memory.json, a deploy config — shared by agents, parallel sessions, formatters, scripts, and you. Each one reads, holds a view, and writes back, and nothing checks whether the file moved in between. CoherentVolume is the data plane for that workspace: your bytes stay on the real filesystem; a local coordinator tracks who holds which version, denies a stale write fail-closed, and catches the edits that went around it.
Architecturally, CoherentVolume is an out-of-process coordinator client, not an in-process wrapper. It spawns (or attaches to) a local coordinator, and routes reads and writes through it. The coordinator holds only MESI state, a content hash, and a version per managed file — your live files stay on disk. Point a sibling volume in another process at the same workspace and it attaches to the same coordinator, so a single-host fleet shares one coherent view.
from ccs.adapters.coherent_volume import CoherentVolume vol = CoherentVolume(workspace_root, managed=("plans/**", "memory/**")) data = vol.read("plans/plan.md") # bytes — registers a SHARED view vol.write("plans/plan.md", revise(data)) # stale view? denied fail-closed data = vol.reacquire("plans/plan.md") # recover: re-mint identity + mandatory fresh read
The explicit read / write / reacquire / write_cas API is the supported primitive. write_cas(path, make_content) is the optimistic counterpart for concurrent same-key contention: exactly one writer wins, and the loser gets a typed conflict to re-derive from — never a silent drop. For code you'd rather not rewrite, an opt-in, demo-grade open() shim routes managed-path opens through the volume, so existing open() and pathlib calls get coherence unchanged:
from ccs.adapters.coherent_volume import coherent_workspace with coherent_workspace(workspace_root, managed=("plans/**",)): text = open("plans/plan.md").read() # registers a SHARED view open("plans/plan.md", "w").write(edit) # stale view? raises out of close()
Run the failure and the fix yourself — offline, deterministic, no keys: python -m examples.coherent_volume.main. Source: agent-coherence/examples/coherent_volume.
Coordination covers writers that opt in. But real workspaces also get edited from outside: a human fixes a file in an editor, a formatter rewrites it, a script regenerates it. Without a guard, the next agent write silently buries that edit — and the next agent read silently builds on bytes the coordinator never saw. The volume guards both boundaries with a content-hash check:
W
R
vol = CoherentVolume(workspace_root, managed=("plans/**",), on_stale_read="raise") # a formatter rewrites plans/plan.md out-of-band … vol.write("plans/plan.md", revised) # StaleView — the foreign edit survives fresh = vol.reacquire("plans/plan.md") # recover: fresh read, re-derive, re-write
These are content-hash checks at the volume's read/write boundary — best-effort point-in-time detection, not filesystem interception. A write that never goes through the volume is caught at the next volume read or write of that file, not blocked as it happens. Proactively watching a source no agent touches is the source-watcher case: on the roadmap, demand-gated, not shipped today.
Agents don't only overwrite files — they fire effects: a build, a deploy, a PR, a shell command, computed from inputs they read earlier. If the input moved in between, the effect fires on stale state. Two agents derive changes from the same shared state, both branches merge clean, both CI runs go green — and one intent silently vanishes. The isolation-everything answer (a worktree or sandbox per agent) catches textual conflicts at merge; it doesn't close this one.
gate() narrows the window: it captures the input's version at decision time, re-reads at the effect boundary, and fires only if the input is unchanged at that re-read — otherwise it raises StaleView and holds the effect before it runs.
from ccs.adapters import CoherentVolume, gate vol = CoherentVolume(workspace_root, managed=("deploy/**",)) # fires run_deploy(plan) only if deploy/config.txt is unchanged since # decide() read it; else raises StaleView before the deploy runs — # reacquire() and re-decide. gate(vol, "deploy/config.txt", decide=plan_deploy, effect=run_deploy)
It's plain Python, so the same call drops into a LangGraph node, a CrewAI task, a CI step, or a raw script unchanged. Run it: python -m examples.effect_gate.main (offline, deterministic, no keys) — or add --baseline to watch the stale fire it catches.
The gate orders effects; it never rolls one back. It fires pre-effect, so for an effect that escapes the coordinator — the deploy itself, the opened PR — there's a residual re-read→fire window it narrows but can't close. It's cooperative: the agent opts in. For a pure write effect, vol.write_cas_at(path, expected_version, content) is the atomic, no-window path — use it directly. Gating several mutually-consistent inputs at once is a snapshot-session operation, not this single-input wrapper.
The gate protects one input. But an agent that reads several artifacts one by one — a plan, a config, a memory file — can see a torn combination: plan.md from before a peer's commit and config.json from after it. Every individual read was current; the set never coexisted. That's read-skew, and it survives any per-file freshness check.
A snapshot session closes that window: it pins a consistent cut of the artifacts you name, captured at a single point, and serves every session read from that cut while peers keep writing. The cut is an inspectable {artifact: version} map, not an opaque handle.
# Against a running coordinator (the same one CoherentVolume spawns), over HTTP: POST /session/begin {session_id, read_set: ["plans/plan.md", "config/app.json"]} → {session_token, cut: {path: version}, …} POST /session/read {session_id, session_token, path} → the artifact at its PINNED version — never a newer one POST /session/commit {session_id, session_token, path, content} → wins only if no peer moved the artifact since the cut POST /session/heartbeat {session_id, session_token} # keep the session's lease alive
Or in-process: CoordinatorService.begin_session(read_set=…, owner=…) → session_read(…) / session_commit(…).
write_cas. Model-checked: NoReadSkewWithinCut and PinAlwaysRetained (Snapshot.tla).Sessions prevent torn reads across artifacts. They do not add write-skew prevention: two sessions that read one cut and write different artifacts can still interleave — commits validate per-artifact against the pinned base. Atomic multi-artifact publish is on the roadmap, demand-gated. And a long session maximizes abort probability at commit: the honest pitch is "consistent reads plus safe commit-validation," not "reason for two hours and reliably commit." One serving detail: when the coordinator retains version bodies it serves the pinned bytes directly; otherwise it returns the pinned version and content hash as a typed signal, and the caller fetches the bytes from its own data plane.
Each row is a safety invariant model-checked with TLA+/TLC. make tla-check runs the specs in CI on every push, and every spec carries a documented mutant that must fail — the invariants are load-bearing, not decorative.
| The silent failure | What happens instead | Invariant |
|---|---|---|
| Stale-read overwrite — an agent acts on an old snapshot and writes over a newer version | The write is denied fail-closed; the writer must reacquire() and read the current version |
SingleWriter, MonotonicVersion |
| Concurrent lost update — two writers hit the same key and both "succeed" | Exactly one wins; the loser gets a typed conflict + bounded retry, never a silent drop | NoLostUpdate |
| Reclaim-zombie write — a stalled writer is reclaimed by crash recovery, wakes later, and lands its stale commit | The commit is rejected with a typed stale_read_generation conflict (the read-generation fence) |
NoStaleApply |
| Torn multi-artifact read — each read was individually current, but the combination never coexisted | Session reads serve from a pinned consistent cut; a lapsed session fails closed with a typed rejection | NoReadSkewWithinCut, PinAlwaysRetained |
| Dead owner blocks the fleet — a crashed agent holds EXCLUSIVE forever | The heartbeat/TTL sweep reclaims the grant (on by default) | sweep invariants I3–I6 |
The foreign-edit guards are the one exception: they're content-hash checks over disk bytes, enforced by tests rather than TLA+ — the model-checked invariants cover the protocol state machine, not the filesystem. Specs, the invariant ↔ implementation map, and the mutant recipes: formal/tla/.
Claiming less than the library does is the cheaper credibility trade. The honest scope is narrower than the headline, and worth stating plainly.
read() / reacquire() returned.open() shim is convenience, not the contract. It covers open() / pathlib text and binary read/write — not raw os.open, subprocess redirection, mmap, or append/update modes; those delegate to the original open() unchanged. The explicit API is the supported surface.gate() fires pre-effect and never undoes one. Sessions prevent read-skew, not write-skew. Atomic multi-artifact publish is roadmap, demand-gated.Common questions about the coherent workspace — CoherentVolume, foreign-edit guards, the effect gate, and snapshot sessions.
No. Your content stays on the real filesystem. CoherentVolume is an out-of-process coordinator client: it routes reads and writes through a local coordinator that holds only MESI state, a content hash, and a version per managed file. Point a sibling volume in another process at the same workspace and it attaches to the same coordinator, so a single-host fleet shares one coherent view.
The write raises a typed StaleView instead of landing — the newer version on disk survives. Recovery is explicit: reacquire() re-mints the caller's identity and forces a fresh read, after which the caller re-derives its change and writes again. A bare retry does not clear the denial; that is deliberate, so a naive loop cannot overwrite its way through.
That is the foreign-edit case, and it is guarded at both boundaries with a content-hash check. On write (on by default): if the on-disk bytes changed out-of-band since the volume last saw them, the write raises StaleView instead of clobbering the foreign edit. On read (opt-in via on_stale_read="raise"): re-reading a file whose bytes changed out-of-band raises instead of returning bytes your other state wasn't computed from. This is point-in-time detection at the volume's boundary, not filesystem interception — a write that never goes through the volume is caught at the next volume access of that file, not blocked as it happens.
That is the effect-ordering gate. gate() captures the input's version when your code decides, re-reads at the effect boundary, and fires the effect only if the input is unchanged — otherwise it raises StaleView before the effect runs. It orders effects; it never rolls one back, and for an effect that escapes the coordinator there is a residual re-read-to-fire window it narrows but cannot close. For a pure write effect, write_cas_at is the atomic, no-window path.
Yes — multi-artifact snapshot sessions (v0.11.0). A session pins a consistent cut of the artifacts you name, captured at a single point, and serves every session read from that cut while peers keep writing. Reading an artifact outside the pinned set is refused with a typed rejection, never silently served from live state, and a lapsed or lost session fails closed. This prevents read-skew (torn reads across artifacts); it does not add write-skew prevention — commits validate per-artifact against the pinned base.
The guarantees hold for a single-host fleet under one coordinator. Concurrent same-key writers on that host are covered by optimistic commit-CAS. Coordinating writers across multiple hosts is on the roadmap, demand-gated — if you need it, open an issue on the repo.
15-minute call. Walk me through what reads and writes your shared files — agents, CI steps, humans — and I'll tell you which of these guards applies, and where you'd hit the single-host boundary that's still on the roadmap.