Skip to content

Should AI agents run in a sandbox?

Often yes, but not always for the same reason.

AI agents should run in a sandbox when they can:

  • execute code,
  • browse untrusted pages,
  • read or write files,
  • call tools with side effects,
  • or touch secrets, networks, and system resources.

The sandbox is there to contain execution risk. It does not decide what the agent should be allowed to do in the first place.

The weak question is:

“Do we trust the model?”

That is not the right control question.

The better question is:

“If this run goes wrong, what boundary stops it from becoming a larger systems incident?”

That is where sandboxing becomes useful.

Sandboxing is usually mandatory when the agent can:

  • run generated code,
  • inspect or transform local files,
  • browse arbitrary websites,
  • use developer tools,
  • or operate inside engineering environments where accidental writes, secret exposure, or network access create real damage.

Without isolation, one weak run can become a much larger control failure.

Not every agent needs heavy execution isolation.

If the agent only:

  • drafts text,
  • summarizes evidence,
  • routes requests,
  • or proposes actions that still require separate human approval,

then narrower permission design and application-owned controls may matter more than a full sandbox runtime.

The right level of isolation depends on what the agent can actually touch.

Sandboxing is strong when it limits:

  • filesystem scope,
  • network reach,
  • process execution,
  • credential exposure,
  • and the blast radius of bad tool behavior.

It is especially valuable when the model can encounter untrusted input and then choose actions.

Sandboxing does not solve:

  • bad approval policy,
  • broad business permissions,
  • weak audit logs,
  • unsafe user-scoped authority,
  • or a workflow that should never have been autonomous.

Teams often overestimate sandboxing because it feels concrete. But a sandbox around an overpowered workflow is still an overpowered workflow.

The healthy production pattern is usually:

  1. narrow permissions first,
  2. sandbox execution second,
  3. approval and escalation for irreversible actions,
  4. logging and evaluation after every important run.

This is why sandboxing belongs inside a broader control plane, not as a standalone safety story.

Use strong sandboxing when the agent can execute, browse, or mutate technical systems directly.

Use lighter isolation when the agent remains in read-only or draft-only lanes and business controls already contain the output.

If the team cannot explain the blast radius of a failed run, the system probably needs stronger isolation than it has.

Your sandboxing decision is probably healthy when:

  • the team can describe exactly which resources the agent can touch;
  • execution, filesystem, network, and secret boundaries are explicit;
  • sandboxing is paired with approval and permission design;
  • the logs can show what happened inside the boundary;
  • and the team knows which actions should still be impossible even inside the sandbox.

This page should help a reader decide which authority, data access, tool scope, and runtime boundary the agent system should receive. For Should AI agents run in a sandbox?, the page is not finished if it only explains vocabulary. It should change what the team approves, measures, routes, buys, logs, or refuses to automate.

Before applying the guidance, bring tool lists, auth scopes, sandbox limits, customer data classes, audit trails, and examples of unsafe tool output. Those inputs keep the decision anchored in real operating conditions instead of a generic best-practice list.

CheckWhat the reader should be able to answer
AuthorityDoes the page distinguish advice, draft, write, delete, payment, and permission-changing actions?
IdentityIs it clear whether the agent acts as a user, service account, or constrained system role?
Runtime boundaryAre tools, network access, files, and secrets scoped to the smallest practical surface?
AuditabilityCan the team explain after the fact what the agent saw, decided, and changed?

Use the page as a working review artifact: compare the current workflow against the table, mark the missing evidence, and assign an owner for the next change. If the page exposes a gap but no one owns that gap, the correct next step is not broader rollout; it is a smaller pilot, a clearer gate, or a better measurement loop.

For agent-system pages, the value is a safer architecture decision. The page should help readers reduce hidden authority before they add more tools or autonomy.