Nobody announced the moment the LLM entered enterprise decision-making.
There was no architecture review. No governance committee. No change control process. One day a procurement manager summarized a vendor proposal with an LLM and pasted the output into an approval memo. A financial analyst used one to draft a board presentation. A lawyer used one to review a contract before flagging it for senior counsel.
The LLM was in the write path. It was shaping the artifacts that went into the SOR. It was influencing the decisions that produced the records the enterprise called authoritative.
And nobody counted it as a system.
This is the unacknowledged infrastructure story of the past three years. Not the AI tools that were formally evaluated, procured, and governed. The AI that entered enterprise decision-making through the habits of individual contributors, invisible to the architecture and ungoverned by design.
The harness layer arrived and created the illusion that this was being managed. It wasn’t. It was being coordinated. Those are different things. And the distinction between them is where the provider compression argument begins.
What the harness actually did
The harness layer — LangChain, LlamaIndex, the orchestration frameworks that built an industry between 2022 and 2024 — solved a real problem. LLMs were stateless. Each call started from scratch. There was no memory of prior interactions, no persistent context, no way to chain a sequence of decisions into a coherent operational workflow.
The harness built scaffolding around the stateless model and created the appearance of continuity. It managed conversation history. It retrieved context from vector stores. It chained tool calls. It logged outputs. It built the connective tissue that made a stateless inference engine behave, from the outside, like a stateful system.
This was genuine engineering work. It enabled real enterprise use cases that would not have been possible otherwise. And it was, from the first day, a temporary architecture.
The scaffolding vendors were always writing v0.1 of a specification they would never get to finish. Every orchestration pattern invented, every memory management approach pioneered, every evaluation framework designed — this was a distributed R&D effort conducted at enterprise scale, funded by the companies trying to deploy LLMs in production. The provider read the specification, understood what v1.0 required, and built it.
The scaffolding vendors were not building moats. They were writing the roadmap for someone else.
The statefulness illusion
There is a precise way to say what the harness did and what it didn’t do.
The harness created the illusion of statefulness. The LLM remained stateless. Every inference call was a fresh computation with no inherent memory of what came before. The harness injected the prior context — retrieved from its own persistence layer, formatted into the prompt, included in the context window — and the model produced an output that appeared continuous with prior interactions.
This worked. Enterprises built production systems on it. Some of those systems handled consequential decisions.
But the statefulness was in the scaffolding, not the model. And the scaffolding was maintained by the enterprise or the middleware vendor — not the provider. Which meant the intelligence was not accumulating in the substrate. It was being reconstructed on every call from whatever the harness had managed to persist.
The refinement layer changes this at the architectural level. Memory is not reconstructed. It is maintained. Dreaming does not replay prior context. It extracts patterns from prior sessions and restructures the memory so that the next session begins from a more refined operational model than the last one did.
This is the difference between an actor reading from a script and an actor who has internalized the role. The harness read from a script it assembled at runtime. The provider’s refinement layer produces an actor who remembers.
The harness coordinated LLM behavior. It did not manage it. Coordination produces consistent outputs from a stateless model by assembling the right context on every call. Management produces consistent outputs from a stateful substrate that carries the context forward and refines it over time. The distinction is not semantic. It is the architectural difference between reconstruction and compounding.
The compression was always coming
In The Great Compression, we wrote about the structural dynamic driving middleware absorption. Providers extend downward into the orchestration and tooling layer. The harness vendors who built on top of the provider’s API find themselves building features the provider ships as primitives.
This was visible as a pattern before it was confirmed as a strategy. Every time a provider shipped a native tool-calling capability, a harness framework’s integration layer became redundant. Every time a provider shipped a context management primitive, a vector store vendor’s core use case narrowed. The compression was not competitive aggression. It was architectural gravity. The provider owns the model. The model is where the value compounds. Everything built on top of the model to compensate for its limitations gets absorbed when those limitations are addressed natively.
Managed agents with memory and dreaming is not a product announcement. It is the structural completion of what the Great Compression described. The scaffolding layer’s core function — creating the appearance of stateful, continuous, governed agent operation — is now a provider primitive. Leaky abstractions were one of many reasons the scaffolding specifications were set aside. Performance degradation was the final verdict. The scaffolding vendors have two strategic options: specialize into the integration work that sits between provider primitives and enterprise systems, or compete directly with the provider on a layer the provider owns and will always be better positioned to develop.
Neither option produces the outcomes that 2022 and 2023 venture investments were underwritten against.
The write path is now explicit
Return to the procurement manager from this post’s opening. She summarized a vendor proposal with an LLM and pasted the output into an approval memo. The LLM was in the write path — shaping the document that produced the SOR record — but it was stateless, unacknowledged, and outside any governance framework.
Now the architecture makes it explicit.
The agent receives the instruction: evaluate this vendor. It retrieves context from the agentic substrate — prior procurement history, compliance flags, financial exposure, relationship context. It reasons across that context, produces an assessment, and routes it through a defined outcome rubric that grades the recommendation against the enterprise’s stated criteria. The grader evaluates the output in an isolated context window, independent of the agent’s reasoning chain. The result is recorded with full provenance — what instruction was received, what context was retrieved, what reasoning was applied, what outcome was produced, what the evaluation determined.
The LLM is still in the write path. It is now stateful, acknowledged, governed, and traceable.
This is not a new capability. It is the formalization of what was already happening — with the scaffolding, governance, and accountability that informal LLM use in enterprise workflows never had.
What the provider now owns
Follow the ownership through the write path.
The instruction — captured by the agentic substrate, which is provider-managed. The context retrieval — organized by the refinement layer, which the provider owns. The reasoning — the model itself, which the provider built and continuously improves on enterprise operational data. The evaluation — the outcomes rubric and grader, which ship as provider primitives. The observability — the trace of every step in every agent session, captured in the provider’s infrastructure. The memory and refinement — dreaming, the mechanism that makes the next session smarter than this one.
The write path runs through the provider. From instruction receipt to outcome evaluation, every load-bearing component of the agentic decision cycle is a provider primitive.
The SOR receives the artifact at the end. It records what the agent decided. It captures the echo of a cognitive process that ran entirely on provider infrastructure.
This is the Great Compression’s final form. Not the compression of middleware tools. The compression of the enterprise cognitive layer itself — the infrastructure through which enterprise decisions are made, governed, and made accountable — into a provider-owned substrate.
The harness vendors built the specification. The provider built the system. That is how architectural compression works. And it is always, in retrospect, the only way it could have gone.
In July 2024, three a16z partners wrote that AI would so fundamentally reimagine the core system of record that no incumbent would be safe. They named companies — Clay, 11x, Day.ai, People.ai — as the AI-native replacements that would win by owning the new data infrastructure and reimagining the workflow from the ground up.
The analysis was correct about the direction. The AI-native application layer was the right threat to the incumbent SOR.
What it did not see was that the same compression dynamic that would displace the incumbent SOR would also compress the AI-native replacement — not from below, but from above. The provider substrate layer arrived between the AI-native application and the data it was trying to own. The companies a16z funded to displace Salesforce now sit between the SOR they were supposed to kill and the provider refinement layer that owns the intelligence they were supposed to accumulate.
A fund needs 7 to 10 years to return capital. The Great Compression is moving on an 18 to 24 month cadence. Those timelines are structurally incompatible. A16z is not slow or wrong. They are operating on a clock that the pace of compression has made obsolete. The visionaries who called the last transition with precision are now managing positions in the transition they did not fully anticipate. That is the nature of compression.
The hard claim
The scaffolding vendors were never building moats. They were writing v0.1 of a specification the provider used to build v1.0.
Every orchestration pattern. Every memory management approach. Every evaluation framework. Every governance primitive invented in the field was a design input the provider absorbed, improved, and shipped as a native primitive. Leaky abstractions disqualified the scaffolding from the enterprise substrate. Performance degradation confirmed it. The specifications went into the audit logs — and the provider built what the scaffolding was always pointing toward but never capable of becoming.
The LLM was always in the write path. The scaffolding made it look managed. The provider made it actually managed — natively, at the substrate level, with statefulness that doesn’t require reconstruction and intelligence that compounds rather than reconstructs.
The scaffolding created the illusion. The provider owns the reality.
And the enterprise architect who understands this distinction is asking a different question than their peers. Not which orchestration framework to use. Not which harness vendor has the most integrations. But which provider’s refinement layer they are building their enterprise intelligence on — because that is the decision that compounds, and it is the decision that is already being made whether or not it is being made deliberately.
