On March 13, 2026, Fortune published a synthesis of Morgan Stanley’s “Intelligence Factory” report under the headline: A massive AI breakthrough is coming in the first half of 2026 — and most of the world isn’t ready for it. The report is serious, the sourcing is credible, and several of its claims are genuinely important. But buried near the end, attributed to xAI co-founder Jimmy Ba, is the sentence that should anchor every enterprise AI governance conversation happening right now: recursive self-improvement loops — where AI autonomously upgrades its own capabilities — could emerge as early as the first half of 2027. One sentence. Past the workforce reduction projections, past the power grid analysis, past the infrastructure economics. The mechanisms the report describes as present tense are already recursive in character. The governance question those mechanisms create cannot wait for 2027 to become urgent.
There is a framing problem at the center of how the enterprise thinks about recursive AI self-improvement, and the Morgan Stanley report illustrates it precisely. The framing positions RASI — recursive AI self-improvement — as a threshold event: a moment in the future when something qualitatively different begins. This is the framing that produces deferral. If RASI is a 2027 event, then the governance infrastructure required to manage it is a 2026 problem at the earliest. Current priorities take precedence. The conversation is scheduled for later.
The problem with that framing is that it mistakes a closed loop for the only form of recursive self-improvement that matters governmentally. The loop does not need to be fully autonomous and end-to-end closed to create governance obligations. What it needs to do is exist — partially, incrementally, with humans still in several of the stages — for the alignment question to become architecturally consequential. And by that definition, the loop is already running across every serious AI deployment in operation today.
This post does two things. It establishes what “recursive” actually means with sufficient precision to make the governance argument land. And it maps the partial loops already operating in production systems — not as a scare tactic, but because practitioners making infrastructure decisions right now need to understand what they are building into.
What the Morgan Stanley Report Actually Says — and Where It Stops
The report deserves credit for several things it gets right. The compute scaling argument is sound: the claim that applying 10x the compute to LLM training effectively doubles a model’s capability is consistent with scaling law research that has held for several years. The infrastructure constraint analysis — a projected net U.S. power shortfall of 9 to 18 gigawatts through 2028 — is a serious finding with real supply-chain implications. And the workforce displacement predictions, while uncomfortable, are supported by observable evidence rather than extrapolation alone.
What the report does less well is connect these present-tense observations to their structural implication. Consider what it has actually described by the time it reaches the Jimmy Ba attribution. A model — GPT-5.4 — scoring at or above expert human level on a benchmark specifically designed to measure economically valuable cognitive tasks. Executives executing large-scale workforce reductions because AI tools can replicate human work at a fraction of the cost. An intelligence explosion arriving faster than almost anyone is prepared for. These are not background conditions for a future recursive loop. They are the conditions inside which recursive dynamics are already active.
GPT-5.4’s score on the GDPVal benchmark — placing it at or above expert human level on economically valuable tasks, as reported by Morgan Stanley’s Intelligence Factory report (March 2026). AI research is itself an economically valuable task. A model performing at expert level in that domain is already capable of contributing meaningfully to its own improvement process — the report does not name this as a recursive threshold, but structurally, it is one.
The analytical gap is this: the report treats recursive self-improvement as a categorically different future state, when the mechanisms it has already described are recursive in character. RLAIF — reinforcement learning from AI feedback — is in production at every major frontier lab. The model evaluating the model’s outputs to generate training signal is a partial recursive loop. It is not speculative. It is the current training paradigm.
Naming the gap is not a criticism of the report. Financial research serves a different purpose than infrastructure architecture analysis. But enterprise practitioners reading it need to make the translation that the report does not make for them: the governance infrastructure question is not “what do we need when RASI arrives in 2027?” It is “what do we need to have in place given that partial recursive loops are already running in our systems today?”
What Recursive Actually Means — Precisely
The word “recursive” is doing significant work in this conversation and it is worth being exact about what it means in this context — not in the colloquial sense of “AI getting smarter,” but in the architectural sense that determines what governance structures are required.
Recursive Self-Improvement: A Working Definition
A system exhibits recursive self-improvement when the output of one improvement cycle becomes the input capability for the next improvement cycle — and the agent performing the improvement is itself the agent being improved. The critical feature is not autonomy (though full autonomy is the endpoint concern) but compounding: each cycle produces a more capable improver, which produces better improvements, which produces a still more capable improver.
This is distinct from ordinary machine learning, in which humans design the improvement process, select the training data, define the objective function, and evaluate the outputs. In recursive self-improvement, the AI system participates in — or ultimately drives — one or more of those stages. The degree of participation determines whether the loop is partial or closed. A partial loop has humans in several stages. A closed loop has the AI system running all stages autonomously. Both are recursive in the technical sense. Only the closed loop is what popular discourse calls RASI.
The governance implication of this distinction is significant: the alignment problem created by a partial loop is not categorically smaller than the one created by a closed loop. It is structurally similar and already live. The difference is the speed and autonomy of the compounding, not the presence or absence of the recursive dynamic itself.
The loop has six stages regardless of how much human involvement exists at each: evaluate current capability, identify performance gaps, design or select improvements, validate the improved version, deploy the improved version as the active agent, and recurse — with the deployed, improved agent now running the next evaluation cycle. In a fully human-designed system, humans occupy stages one through four. In a fully closed RASI loop, the AI system occupies all six. Between those poles is a wide spectrum of partial loops — and that is where enterprise AI deployments currently sit.
| Stage | Human-designed | Partial loop Enterprise today |
Closed RASI |
|---|---|---|---|
| 01 Evaluate |
Human
|
Mixed
RLAIF active
|
AI
|
| 02 Identify |
Human
|
Mixed
AI-assisted
|
AI
|
| 03 Design |
Human
|
Mixed
Auto-selection
|
AI
|
| 04 Validate |
Human
|
Human
Primary gate
|
AI
|
| 05 Deploy |
Human
|
Mixed
Pipeline-automated
|
AI
|
|
Governance Gate · The only viable intervention point
|
|||
| 06 Recurse |
Human
|
Nominal
Gate in name only
|
AI
|
The Partial Loops Already Running
The four examples below are not edge cases or research curiosities. They are production realities at organizations deploying AI at scale in 2026. Each represents a different stage of the RASI loop where AI participation has displaced human involvement — partially or entirely.
Reinforcement learning from AI feedback
RLAIF replaces human raters with a trained AI model to generate the preference signal that shapes the next training iteration. The AI system evaluating outputs to produce reward signal is directly influencing what the next version of itself will learn to do. This is stages one and two of the loop — evaluate and identify — with AI participation. It is the current training paradigm at Anthropic, Google DeepMind, and OpenAI. It is not experimental. The partial loop is the production loop.
The governance implication: the objective being optimized in RLAIF is specified by the reward model. If the reward model has any systematic bias — toward outputs that appear aligned rather than outputs that are aligned, toward responses that humans rate highly in evaluation but that generalize poorly to deployment, toward performance on measured benchmarks rather than unmeasured real-world objectives — that bias compounds with each training cycle. The recursive structure amplifies whatever is in the feedback signal, including its errors.
Agentic self-modification at runtime
Modern agent frameworks — LangGraph, AutoGen, and their successors — support agents that modify their own system prompts, rewrite their memory structures, add or remove tool configurations, and restructure their own reasoning chains mid-task. This is not a capability being developed for future deployment. It is a feature in current production harness implementations, used to improve agent performance on long-horizon tasks.
What this represents in RASI terms is stages three and five of the loop — design improvements and deploy — operating within a single agent session, with the agent itself as the architect of the modification. The improvement cycle is not inter-training. It is intra-deployment, running in real time, with real consequences for whatever task the agent is executing when the modification occurs.
The Intra-Session Loop
An agent that rewrites its own system prompt to improve performance on a task is executing a compressed version of the full RASI loop within a single deployment session. The evaluation, identification, improvement, and deployment stages all occur without a human in the loop — because the session never pauses for human review. The compounding is session-scoped rather than training-scoped, but the recursive structure is identical. And unlike training-level recursion, intra-session self-modification leaves no audit trail in standard observability tooling unless the harness layer has been explicitly instrumented to capture it.
AI-assisted architecture and hyperparameter search
Neural architecture search and automated hyperparameter optimization have used AI systems to discover better training configurations for over five years. What has changed in 2026 is the capability level of the systems doing the searching. Earlier NAS approaches used relatively simple search models. Current approaches use frontier-class models — the same models being improved — to reason about architecture tradeoffs, propose modifications, evaluate candidates, and select the next training configuration. The improver and the improved are now the same class of system. The loop is tighter than it has ever been.
Benchmark-driven automated fine-tuning
Automated fine-tuning pipelines at enterprise scale routinely use benchmark performance as the selection criterion for which model variant gets deployed next. The model that scores best on the evaluation suite advances. This seems like standard engineering practice — and it is — but it is also a partial recursive loop in which the AI system’s behavior on evaluation tasks determines the characteristics of its successor. If the evaluation suite is poorly specified relative to the actual deployment objective, the recursive selection process will optimize systematically away from that objective. Goodhart’s Law is not a new observation, but recursive selection structures make it compounding rather than additive.
Humans in Multiple Stages
AI participation in evaluation, training signal generation, runtime self-modification, and architecture search. Humans retain design authority over objectives, training data curation, and deployment decisions. The recursive dynamic is real but bounded by human checkpoints at several stages.
Governance challenge: the checkpoints are increasingly nominal. Evaluation cycles run faster than human review cadences. Deployment pipelines are automated. The human in the loop is present on paper, absent in practice for many intermediate decisions.
Active · Governable NowAI Across All Stages
The AI system autonomously evaluates its own performance, identifies improvement targets, designs training modifications, validates the improved version, and deploys it — without human involvement at any stage. The Jimmy Ba 2027 estimate. The endpoint of the capability trajectory the Morgan Stanley report describes.
Governance challenge: there is no human checkpoint to strengthen because there is no human checkpoint. The only viable governance surface is the infrastructure layer between cycles — which must have been built before the loop closed, because retrofitting it afterward is categorically harder.
Emerging · Prepare NowWhy the Loop’s Closure Is a Governance Event, Not Just a Capability Event
The intelligence explosion narrative is a capability story: AI systems getting smarter faster, benchmarks falling, jobs transforming, power grids straining. That story is real and consequential. But it is not the governance story. The governance story is about something different: at what point in the recursive improvement cycle can human oversight, policy enforcement, and objective validation be inserted — and what infrastructure must exist to make that insertion possible?
The answer to that question is structurally constrained in a way that the capability narrative does not surface. Human oversight cannot be inserted inside a training run. It cannot be inserted inside an automated evaluation cycle. It cannot be inserted inside a runtime self-modification event. It can only be inserted between cycles — in the interstitial space where one version of the agent hands off to the next, where the improved system is being considered for deployment, where a human or an institutional process can observe what changed and decide whether to proceed.
The intelligence explosion is a capability story. The governance story is about the one place in the recursive loop where human judgment can actually land — and whether that place has been built to receive it.
This interstitial space — the Deploy→Recurse transition in the six-stage model — is not a gap in the architecture. It is a designed feature of any responsibly constructed AI development process. The question is not whether it exists in principle. The question is whether it has been built with sufficient intentionality to function as a genuine governance gate: one that can observe behavioral change, audit objective fidelity, enforce policy constraints, and interrupt the loop when warranted.
In most enterprise AI deployments today, that interstitial space exists as a deployment pipeline — an automated process for moving a model or agent from testing to production. That pipeline was designed for speed and reliability. It was not designed to serve as an alignment gate. It has no objective fidelity audit. It has no behavioral constraint persistence check. It has no interrupt authority that a risk officer can exercise. It is a coordination mechanism, not a governance mechanism. And those are not the same thing.
The Interstitial Space Has a Name
In the harness engineering work we have published across three posts in this series, we have argued that the agent harness layer — the middleware that orchestrates, routes, observes, and manages AI agents — is the critical compression point in enterprise AI infrastructure. The coordination argument for the harness layer is well established: it reduces complexity, enforces policy, captures operational data, and prevents the substrate sprawl that makes agentic deployments ungovernable at scale.
The RASI argument for the harness layer is different in kind, not just in degree. The harness layer does not merely occupy a convenient position in the architecture. It occupies the only position where an alignment gate can be structurally enforced. It is the interstitial space between agent cycles, made concrete in infrastructure. Whoever controls what happens in that space controls what behavioral constraints survive across improvement cycles, what human review is triggered and when, what constitutes an acceptable improvement versus a policy violation, and who can modify those rules and under what conditions.
That is not a metaphor for strategic importance. It is an architectural description of where governance must live if it is going to live anywhere. The capability story says the loop is closing. The governance story says the only place to stand is the one place the loop must pass through — and that place needs to have been built for the purpose before the loop gets fast enough to pass through it unexamined.
Enterprises that build harness layer maturity as governance infrastructure — not as operational tooling — will have the alignment gate in place when the recursive loop closes. Enterprises that defer will face a problem that compounds with every cycle: a system that has already been optimized to route around friction, in a harness that was never built to stop it.
