Two independent architectures converge on one move: replace agent trust with structure.
Stop asking agents to trust each other’s claims; make the shared work an artifact the system maintains and verifies.
The problem these architectures solve
Recall the failure anatomy. CooperBench attributes its cooperative breakdowns to three capability gaps — agents fail to model what their partner is doing, fail to honor their own commitments, and fail to exchange the information a partner can act on (Khatua et al., 2026, CooperBench: Why Coding Agents Cannot be Your Teammates Yet, arXiv:2601.13295v2, preprint). The common thread is partial observability: each agent acts while holding an uncertain model of the other’s state, and a merge can be conflict-free yet still embed incompatible assumptions. The architectures below attack that root directly. Rather than improve the agents’ ability to trust and verify each other’s claims — which CooperBench shows is unreliable — they remove the need for it.
Answer one: orchestrated delegation with verifiable integration
The first response keeps a central coordinator and makes integration an executable, test-gated step. CAID — Centralized Asynchronous Isolated Delegation — has a manager build a dependency graph of the work, assign each unit to an engineer agent only when its dependencies are integrated, and give each engineer an isolated git worktree to work in (Geng & Neubig, 2026, Effective Strategies for Asynchronous Software Engineering Agents, arXiv:2603.21489v1, preprint). Engineers report completion by committing; the manager integrates by merging, which surfaces conflicts explicitly and hands resolution back to the responsible engineer. Inter-agent communication is structured — a JSON protocol plus the commits themselves — rather than free-form dialogue, a deliberate choice to avoid the misalignment that open conversation produced in CooperBench.
The result is that coordination guarantees come from the substrate, not from the agents’ goodwill. CAID improves accuracy over a matched single-agent baseline across three models and two long-horizon benchmarks, and an ablation makes the mechanism legible: hard worktree isolation outperforms a softer instruction-level isolation, and on open-ended tasks the soft version actually falls below single-agent performance (Geng & Neubig, 2026). The structure is load-bearing — remove it and the multi-agent advantage inverts. Two honest caveats travel with this. The headline gains are largest on a weaker base model and compress on stronger ones, and the design carries a roughly two-to-four-times token cost. Reliability here is bought, not free.
Answer two: shared convergent state instead of message passing
The second response removes the coordinator from the critical path and lets agents coordinate by observing a shared state that is guaranteed to converge. CodeCRDT has agents edit a common document represented as a conflict-free replicated data type, so concurrent edits merge deterministically with no lock contention and no merge-time negotiation (Pugachev, 2025, CodeCRDT: Observation-Driven Coordination for Multi-Agent LLM Code Generation, arXiv:2510.18893v1, preprint). Agents watch the shared state, claim open work through a provably safe at-most-one-winner protocol, and avoid collisions without routing intentions through a coordinator. Across 600 trials the consistency result is unambiguous: 100% convergence, zero merge failures, no manual conflict resolution.
That number is the cleanest demonstration in the corpus that spatial coordination — agreeing on who touches what — is now a solved engineering problem when you build the right substrate. But the same paper is precise about what convergence does not buy. Semantic conflicts — duplicate declarations, type mismatches, edits that are individually valid and jointly incoherent — persist at an estimated 5 to 10% (Pugachev, 2025). The substrate guarantees the edits combine; it cannot guarantee they agree. This is the spatial-versus-semantic distinction from Post 2 made concrete: structure closes the spatial gap completely and leaves the semantic gap open, which is precisely why a verification layer is a mandatory component of a shared-state agent fabric, not an optimization.
The same move, seen from the agents’ successes
These are not foreign impositions on how agents want to work — they are the deliberate version of what agents already do on their rare good runs. CooperBench’s own analysis of its successful cooperative trajectories finds three recurring behaviors: role division, resource division, and negotiation. What distinguishes every one of them is the conversion of vague intention into a specific, verifiable commitment — explicit insertion points, line-level boundaries that make collision impossible by construction, mutually exclusive task splits that leave nothing to interpret (Khatua et al., 2026). The architectures generalize that pattern and make it the default rather than the exception. CAID’s dependency graph and worktrees are role and resource division enforced by the substrate; CodeCRDT’s claim protocol is resource division with a safety proof. The capability already exists in the models intermittently; the architecture’s job is to make it reliable.
The production view already builds this way
The leading practitioner account treats shared state as something to construct rather than negotiate. Anthropic’s Research system has subagents write their outputs to a shared filesystem and pass lightweight references back to the lead agent, rather than routing full results through conversation — explicitly to “minimize the ‘game of telephone’” (Hadfield et al., 2025, How we built our multi-agent research system, Anthropic Engineering). That is the CodeCRDT move at the level of work products: the shared artifact is authoritative, and agents reference it instead of reconstructing each other’s state from a transcript. The same account is candid that real-time agent-to-agent coordination remains weak and that its lead agents still execute subagents synchronously because asynchronous coordination introduces exactly the state-consistency and integration challenges these architectures are built to manage (Hadfield et al., 2025). The peer-reviewed corpus and the production system are reading from the same page: where work is interdependent, make the shared state a maintained artifact and make integration an explicit, verifiable step.
Choosing between the two
The two architectures fit different workloads, and the distinction is practical. Orchestrated delegation suits work with clear dependency structure and a natural integration gate — most software tasks, where a manager can sequence units and a test suite can validate the merge. Shared convergent state suits work that decomposes into genuinely independent edits on a common artifact, where the cost of a coordinator on the critical path outweighs its benefit. Both share the defining commitment of coordination by construction: the shared work is an artifact the system maintains and verifies, not a belief each agent holds about the others. And both inherit the same boundary — neither closes the semantic gap. Structure makes integration safe and makes it happen; whether the integrated result is coherent still requires verification, which is the subject of the next post.
The coordination gap has a buildable answer with a known limit. Replacing trust with structure — isolated workspaces and test-gated merges, or convergent shared state with a safe claim protocol — closes the spatial coordination problem and turns integration from a hopeful negotiation into an engineered step.
It does not close the semantic gap. Any honest deployment treats verification as a required layer, not an afterthought — and the reliability is real and bought: structure costs tokens and coordination overhead.
