Compile the Law, Then Run It

Attribution. The DACL system, the Amortized Intelligence architecture, and every finding, figure, and production claim discussed here are the work of Stanisław Sójka and Witold Kowalczyk of Delos AI, published in Accurate Legal Reasoning at Scale: Neuro-Symbolic Offloading and Structural Auditability for Robust Legal Adjudication [1]. This is an independent Luminity readout — our analysis and commentary on their paper. We claim no part of the research, the system, or its results; only the reading of them. All quantitative claims below are the authors’ own, as reported in [1], and have not been independently verified.

With that established: this readout stands alongside Assurance by Architecture (Series 23) and Why Legal RAG Fails (Series 24) as the worked, in-production example of a claim both series make — that for the computable slice of law, defensibility comes from removing the model from the loop, not from a better model.

Most legal AI debates are about open-textured reasoning — judgment, precedent, persuasion. This paper is about the part of law that is, underneath the prose, arithmetic: pricing formulas, eligibility thresholds, rate brackets, procedural steps gated by dates. The authors call these computational legal clauses, and they are everywhere the money is — insurance, taxation, financial regulation, commercial billing. They look like language. They behave like code.

That distinction is the whole paper. A frontier model asked to interpret such a clause at runtime fails in two ways that are fatal in production. It fails silently on the math — hallucinating a pricing coefficient, dropping a nested condition — because chain-of-thought is not a calculator and is not always faithful to its own steps. And it fails on economics: re-reading the full contract for every one of a thousand monthly invoices scales cost linearly, which makes autonomous adjudication commercially pointless. This is the hallucination problem [2] in its most unforgiving form — a wrong number, stated with confidence, on a transaction that bills a customer.

The shift: from interpreter to compiler

The fix is a change of architecture, not of model. Instead of using the LLM as a runtime interpreter that reasons over contract text on every transaction, the system uses it once as a compiler: it translates the contract into DACL — Deterministic Autonomous Contract Language — a typed graph intermediate representation. After that one-time translation, every adjudication reduces to deterministic graph execution. The authors name the principle Amortized Intelligence: pay for the heavy interpretation once, then run the result indefinitely at the cost of executing a graph.

This is the same idea Series 23 called probability isolation — confine the model to language, route the consequential logic through a deterministic substrate — but pushed to its conclusion and shipped. The model is not made more reliable. It is moved out of the path where reliability matters.

What DACL is

DACL models contract logic as a directed acyclic graph with a strongly-typed schema, which buys referential transparency: identical inputs produce identical outputs, by construction. It is deliberately not Prolog — not a general theorem prover — but a domain-specific representation built around four recursive primitives that cover the recurring shapes of commercial logic: Procedure (sequential workflows with early termination), Logical Clause (priority-ordered first-order conditions), Range Clause (continuous values mapped to non-overlapping brackets, which kills off-by-one bracket errors at load time), and Pricing Formula (sandboxed arithmetic with configurable decimal precision). Amendments are handled by clause-level recompilation with validity-date annotations, so the engine applies the version in force on the event date rather than regenerating the whole contract — a quietly important detail for anyone who has fought retroactive rating errors.

The agent does less, on purpose

At runtime an autonomous agent built on a lightweight model (gpt-5-mini) does only two things: it maps the natural-language query to the relevant clause identifiers, and it synthesizes a final answer. It does not do the arithmetic or the logic. Those are delegated, through a single tool, to the deterministic engine, which returns both a computed value and a structural audit trace — and where the probability of the output given the inputs is, by design, one. The model handles the flexible edges; the verified core handles everything that can bill a customer or deny a claim. This is the architectural inversion the determinism literature has been arguing for — auditable reasoning over a deterministic substrate [3], with formal guarantees on the logic [4] — instantiated as a product rather than a benchmark.

The results, and the production fact

Against frontier baselines (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, extended reasoning enabled), the DACL agent reports 99.5% accuracy across tasks and over 95% cost reduction on high-volume workflows, while mitigating the “reasoning cliff” where probabilistic models degrade sharply on compounding calculations. Every decision carries a deterministic, visually auditable trace — the governance evidence produced as a byproduct rather than reconstructed after the fact.

The detail that separates this from a benchmark paper: the system has run in live production for over twelve months, executing autonomous billing and accounts-payable workflows across 150-plus commercial agreements and roughly a thousand monthly billing events. In a literature thick with offline evaluations, a year of autonomous adjudication in a real enterprise is the rare and load-bearing claim.

What it does not claim

The readout’s credibility depends on saying what this is not. It is scoped to computational legal clauses — the part of law reducible to typed logic and arithmetic. It does not adjudicate open-textured reasoning, ambiguous precedent, or contested interpretation, and it does not pretend to. The strong numbers are single-vendor and self-reported, on four proprietary agreements and the authors’ own benchmark, in an industry-track paper not independently replicated. And the architecture relocates rather than eliminates model risk: when the LLM compiles a contract once, that one translation becomes the single point of failure, mitigated by a verified multi-stage pipeline but never reduced to zero. The win is real and bounded — and the boundary is the point.

The Hard Claim

For computational legal clauses, the reliable system is the one that uses the model once and then gets it out of the way. Determinism and auditability come by construction — from compiling the law into a typed graph and executing it — not from a larger model reasoning more carefully at runtime.

DACL is the existence proof behind the architecture thesis: probability isolation, shipped and billing for a year. Its limits are exactly as important as its results — it proves the case for the computable slice of law, and only that slice.

This readout sits with Assurance by Architecture (Series 23), which frames the full assurance stack, and Why Legal RAG Fails (Series 24), which details the grounding layer. DACL is the worked example both point to.

Connected Reading

Series 23 · Assurance by Architecture Where Legal AI Earns Its Output

Series 24 · Why Legal RAG Fails Legal RAG Fails at Retrieval, Not Generation

4 sources: the primary paper — Delos AI’s DACL system, a neuro-symbolic legal-adjudication architecture reported in live production for 12+ months (single-vendor, self-reported, ACL 2026 Industry Track) — situated against three corpus works: the foundational US legal-hallucination study it answers, and two adjacent deterministic/verification lines (auditable reasoning over temporal legal knowledge graphs; SMT-backed formal verification of legal reasoning). 2024–2026; includes preprints.

01 · Computational Legal Clauses The part of law that is arithmetic under the prose; where silent math errors bill real customers.
02 · Interpreter → Compiler Use the LLM once to translate to DACL, then execute deterministically (“Amortized Intelligence”).
03 · DACL Typed DAG, referential transparency, four primitives (Procedure, Logical, Range, Pricing Formula), date-versioned amendments.
04 · The Agent Does Less A lightweight model routes and synthesizes; the deterministic engine owns all billable logic, with an audit trace.
05 · The Production Fact 99.5% accuracy, >95% high-volume cost cut vs. frontier LRMs; live 12+ months, 150+ agreements, ~1,000 monthly events.
06 · Bounded by Design Computable slice only; single-vendor self-reported; the one-time compile becomes the residual risk.

The shift: from interpreter to compiler

What DACL is

The agent does less, on purpose

The results, and the production fact

What it does not claim

Use the Model Once. Then Get It Out of the Way.

Like this:

Related

Compile the Law, Then Run It

The shift: from interpreter to compiler

What DACL is

The agent does less, on purpose

The results, and the production fact

What it does not claim

Use the Model Once. Then Get It Out of the Way.

Share this:

Like this:

Related