Code Mode Doesn’t Fix the Trust Layer — Luminity Digital
Agentic AI Security  ·  Accompanying Post
MCP Security  ·  Practitioner Analysis

Code Mode Doesn’t Fix
the Trust Layer

Block’s Goose solves a real problem with Code Mode — context window management across many MCP extensions is a genuine production constraint. But the architectural properties that make it efficient are precisely the properties that make deterministic enforcement harder. This is not a criticism of the implementation. It is a description of the structural tension every MCP-connected agent system operates inside.

March 2026 Tom M. Gomez 9 Min Read

Our post on MCP monoculture risk established that the Model Context Protocol’s structural vulnerability is not a bug in any specific implementation — it is a property of the protocol’s trust assumptions. Every MCP-connected system inherits those assumptions regardless of how the tool-calling layer above it is architected. The case for deterministic enforcement established that the only defense class producing near-zero attack success rates operates by removing the model’s judgment from the security enforcement path entirely. Code Mode, Block’s programmatic approach to MCP tool calling in Goose, is a useful lens for examining what happens when these two findings meet in production. It is being promoted in some corners of the agentic ecosystem as a more controlled and therefore safer way to work with MCP. The claim is worth examining carefully.

Code Mode replaces traditional MCP tool calling with a programmatic layer. Instead of loading all tool definitions from enabled extensions into the context window upfront, Goose exposes three meta-tools — list_functions, get_function_details, execute — and the LLM writes JavaScript at runtime to discover which tools it needs, learn how to use them, and call them programmatically. Multiple tool calls are batched in a single execution, and intermediate results chain directly from one tool into the next without returning to the LLM between steps.

This is a technically elegant solution to a real operational problem. When you have five or more MCP extensions enabled, loading every tool definition into every LLM call is expensive in both tokens and latency. Code Mode defers tool discovery to runtime and processes intermediate results locally, reducing context window pressure significantly. The Goose documentation is honest about the tradeoff: Code Mode only supports text content from tool results. Binary data, images, and non-text content are ignored. That is a meaningful constraint, but it is the right kind of constraint — explicit, documented, and architectural rather than probabilistic.

The Claim Being Made

Several practitioners and at least one prominent blog post have characterized Code Mode as a more controlled, and by implication safer, approach to MCP tool calling than traditional direct invocation. The argument is that having the LLM write explicit code to call tools introduces a layer of intentionality that reduces the risk of unexpected or malicious tool execution. This framing conflates efficiency architecture with security architecture. They are not the same thing, and treating them as equivalent produces a false sense of reduced exposure.

What Code Mode Actually Changes — and What It Does Not

The comparison that matters for security purposes is not traditional tool calling versus Code Mode. It is: what does each approach change about the underlying MCP trust model?

Traditional MCP Tool Calling

Direct Invocation

Tool definitions loaded upfront. Each tool call returns a result to the LLM before the next call. The LLM evaluates each result and decides the next action. Sequential, inspectable, higher context window cost.

MCP server trust assumptions: unchanged. A poisoned tool result reaches the LLM directly and becomes input to the next reasoning step.

Same MCP trust model
Code Mode

Programmatic Dispatch

LLM writes JavaScript. Tools discovered and called at runtime. Intermediate results chained locally — output from tool A becomes input to tool B without LLM review between steps. Batched, efficient, lower context window cost.

MCP server trust assumptions: unchanged. A poisoned tool result now chains directly into subsequent tool calls before any LLM evaluation point.

Same MCP trust model

The MCP server trust assumptions are structurally identical in both approaches. What Code Mode changes is the execution model above those assumptions, not the assumptions themselves. The protocol-level vulnerabilities documented in arXiv:2601.17549 — sampling manipulation, tool result poisoning, cross-server trust propagation — apply equally in both modes. Code Mode does not add verification, attestation, or policy enforcement at the MCP layer.

Three Properties That Create Audit Surface, Not Reduce It

The properties that make Code Mode efficient are precisely the properties that the 2026 research corpus identifies as attack-surface amplifiers.

1. Runtime Tool Discovery

In traditional tool calling, the full set of tools available to the agent is known at session initialization. In Code Mode, the agent discovers tools on demand at runtime by calling list_functions and get_function_details. This means the agent’s effective attack surface is not fixed at deployment — it is determined mid-session by whatever the model decides to discover. A policy enforcement layer that constrains tool access must therefore operate dynamically rather than against a static manifest. That is a harder enforcement problem, not an easier one.

2. Chained Intermediate Results

Code Mode’s defining efficiency property is that intermediate results — the output of one tool call — chain directly into subsequent tool calls without returning to the LLM between steps. This is processed locally in the JavaScript execution environment. From an audit perspective, this means the data flowing between tool calls is not visible to any LLM-layer inspection. A poisoned result from server A can shape the query sent to server B before any human review point, and before any LLM reasoning step that might surface anomalous content.

17%

Baseline defense rate against sandbox escape in agentic systems, per SandboxEscapeBench (arXiv:2603.02277). Frontier models succeed at container escape across 18 attack scenarios, with success rates scaling log-linearly with compute. Code Mode executes LLM-written JavaScript in a sandbox — precisely the execution environment this benchmark evaluates.

3. The JavaScript Execution Sandbox

Code Mode’s execution environment is a JavaScript sandbox. The LLM writes code; Goose runs it. This introduces a class of risk that traditional tool calling does not surface at all: sandbox escape. The SandboxEscapeBench paper (arXiv:2603.02277), which we have flagged as a primary source for our upcoming Series 4 on AI control and containment, benchmarks exactly this attack surface across 18 scenarios with frontier models. The finding that success rates scale log-linearly with compute means this is not a static risk — it grows as models become more capable.

Code Mode is solving an efficiency problem with an architectural choice. The efficiency is real. But the claim that this architecture is more controlled from a security standpoint conflates the explicitness of programmatic dispatch with the guarantees of policy enforcement. They are not the same thing.

Where the Command-Data Boundary Goes

The foundational argument in our deterministic enforcement post draws on arXiv:2602.09947’s formalization of the command-data boundary collapse: LLMs process instructions and data through the same token stream, making learned defenses inherently forgeable. An adversary who can craft data that looks like instructions will always have a path to execution.

Code Mode adds an intermediate representation — JavaScript — between LLM reasoning and tool execution. In one sense this is useful: the code is inspectable before execution in principle. In practice, it does not resolve the boundary collapse. The LLM still writes the JavaScript based on its interpretation of the session context, which includes tool results, user instructions, and any injected content from MCP servers. A prompt injection payload in a tool result can shape the JavaScript the LLM writes for subsequent tool calls just as it can shape the LLM’s next natural-language response in traditional mode. The attack surface moves; it does not shrink.

The Structural Question

The right question is not whether Code Mode is more or less controlled than traditional tool calling. It is: does Code Mode create enforcement points where deterministic policy checks can be applied? The answer is: in principle yes, at the code generation boundary and at the execution boundary. Whether those enforcement points are actually instrumented with policy checks in production deployments is a separate question — and the honest answer is that most are not.

What Deterministic Enforcement Would Actually Require Here

The approaches that achieve near-zero attack success rates in the 2026 corpus — SEAgent’s mandatory access control, Authenticated Workflows’ cryptographic binding, PCAS’s dependency-graph policy compilation — all share one property: they enforce a binary check at the execution boundary that does not read the content of the action being taken, only its compliance with a policy. The model’s reasoning path is irrelevant to the enforcement outcome.

Applying this to a Code Mode environment would require, at minimum: static analysis of the LLM-generated JavaScript before execution to verify that the operations it contains are within a declared policy scope; policy enforcement at the MCP layer that validates each tool call regardless of how it was generated; and audit logging of all intermediate results in the execution chain, not only the final output. None of these are properties of Code Mode as currently specified. They are properties that would need to be added on top of it — which is the same statement that is true of traditional MCP tool calling.

Block is doing genuinely interesting work with Goose, and the Code Execution extension addresses a real friction point in production agent deployments. The point is not that Code Mode is insecure by design. The point is that characterizing it as a more controlled alternative to traditional MCP invocation — without specifying what controls have actually been added at the trust layer — overstates what the architecture delivers and understates the work that remains.

The Full Argument on Deterministic Enforcement

The 2026 research corpus establishes a clear performance boundary between probabilistic and deterministic defense mechanisms. The case for architectural enforcement and the specific controls that produce near-zero attack success rates are documented in our Building Defensible Agents series.

References & Sources

Share this:

Like this:

Like Loading...