With that established: this readout stands alongside Assurance by Architecture (Series 23) and Why Legal RAG Fails (Series 24) as the worked, in-production example of a claim both series make — that for the computable slice of law, defensibility comes from removing the model from the loop, not from a better model.
Most legal AI debates are about open-textured reasoning — judgment, precedent, persuasion. This paper is about the part of law that is, underneath the prose, arithmetic: pricing formulas, eligibility thresholds, rate brackets, procedural steps gated by dates. The authors call these computational legal clauses, and they are everywhere the money is — insurance, taxation, financial regulation, commercial billing. They look like language. They behave like code.
That distinction is the whole paper. A frontier model asked to interpret such a clause at runtime fails in two ways that are fatal in production. It fails silently on the math — hallucinating a pricing coefficient, dropping a nested condition — because chain-of-thought is not a calculator and is not always faithful to its own steps. And it fails on economics: re-reading the full contract for every one of a thousand monthly invoices scales cost linearly, which makes autonomous adjudication commercially pointless. This is the hallucination problem [2] in its most unforgiving form — a wrong number, stated with confidence, on a transaction that bills a customer.
The shift: from interpreter to compiler
The fix is a change of architecture, not of model. Instead of using the LLM as a runtime interpreter that reasons over contract text on every transaction, the system uses it once as a compiler: it translates the contract into DACL — Deterministic Autonomous Contract Language — a typed graph intermediate representation. After that one-time translation, every adjudication reduces to deterministic graph execution. The authors name the principle Amortized Intelligence: pay for the heavy interpretation once, then run the result indefinitely at the cost of executing a graph.
This is the same idea Series 23 called probability isolation — confine the model to language, route the consequential logic through a deterministic substrate — but pushed to its conclusion and shipped. The model is not made more reliable. It is moved out of the path where reliability matters.
What DACL is
DACL models contract logic as a directed acyclic graph with a strongly-typed schema, which buys referential transparency: identical inputs produce identical outputs, by construction. It is deliberately not Prolog — not a general theorem prover — but a domain-specific representation built around four recursive primitives that cover the recurring shapes of commercial logic: Procedure (sequential workflows with early termination), Logical Clause (priority-ordered first-order conditions), Range Clause (continuous values mapped to non-overlapping brackets, which kills off-by-one bracket errors at load time), and Pricing Formula (sandboxed arithmetic with configurable decimal precision). Amendments are handled by clause-level recompilation with validity-date annotations, so the engine applies the version in force on the event date rather than regenerating the whole contract — a quietly important detail for anyone who has fought retroactive rating errors.
The agent does less, on purpose
At runtime an autonomous agent built on a lightweight model (gpt-5-mini) does only two things: it maps the natural-language query to the relevant clause identifiers, and it synthesizes a final answer. It does not do the arithmetic or the logic. Those are delegated, through a single tool, to the deterministic engine, which returns both a computed value and a structural audit trace — and where the probability of the output given the inputs is, by design, one. The model handles the flexible edges; the verified core handles everything that can bill a customer or deny a claim. This is the architectural inversion the determinism literature has been arguing for — auditable reasoning over a deterministic substrate [3], with formal guarantees on the logic [4] — instantiated as a product rather than a benchmark.
The results, and the production fact
Against frontier baselines (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, extended reasoning enabled), the DACL agent reports 99.5% accuracy across tasks and over 95% cost reduction on high-volume workflows, while mitigating the “reasoning cliff” where probabilistic models degrade sharply on compounding calculations. Every decision carries a deterministic, visually auditable trace — the governance evidence produced as a byproduct rather than reconstructed after the fact.
The detail that separates this from a benchmark paper: the system has run in live production for over twelve months, executing autonomous billing and accounts-payable workflows across 150-plus commercial agreements and roughly a thousand monthly billing events. In a literature thick with offline evaluations, a year of autonomous adjudication in a real enterprise is the rare and load-bearing claim.
What it does not claim
The readout’s credibility depends on saying what this is not. It is scoped to computational legal clauses — the part of law reducible to typed logic and arithmetic. It does not adjudicate open-textured reasoning, ambiguous precedent, or contested interpretation, and it does not pretend to. The strong numbers are single-vendor and self-reported, on four proprietary agreements and the authors’ own benchmark, in an industry-track paper not independently replicated. And the architecture relocates rather than eliminates model risk: when the LLM compiles a contract once, that one translation becomes the single point of failure, mitigated by a verified multi-stage pipeline but never reduced to zero. The win is real and bounded — and the boundary is the point.
For computational legal clauses, the reliable system is the one that uses the model once and then gets it out of the way. Determinism and auditability come by construction — from compiling the law into a typed graph and executing it — not from a larger model reasoning more carefully at runtime.
DACL is the existence proof behind the architecture thesis: probability isolation, shipped and billing for a year. Its limits are exactly as important as its results — it proves the case for the computable slice of law, and only that slice.
This readout sits with Assurance by Architecture (Series 23), which frames the full assurance stack, and Why Legal RAG Fails (Series 24), which details the grounding layer. DACL is the worked example both point to.
