Should AI agents run in a sandbox?

What matters first

Often yes, but not always for the same reason.

AI agents should run in a sandbox when they can:

execute code,
browse untrusted pages,
read or write files,
call tools with side effects,
or touch secrets, networks, and system resources.

The sandbox is there to contain execution risk. It does not decide what the agent should be allowed to do in the first place.

The wrong sandbox question

The weak question is:

“Do we trust the model?”

That is not the right control question.

The better question is:

“If this run goes wrong, what boundary stops it from becoming a larger systems incident?”

That is where sandboxing becomes useful.

When sandboxing is clearly required

Sandboxing is usually mandatory when the agent can:

run generated code,
inspect or transform local files,
browse arbitrary websites,
use developer tools,
or operate inside engineering environments where accidental writes, secret exposure, or network access create real damage.

Without isolation, one weak run can become a much larger control failure.

When lighter containment may be enough

Not every agent needs heavy execution isolation.

If the agent only:

drafts text,
summarizes evidence,
routes requests,
or proposes actions that still require separate human approval,

then narrower permission design and application-owned controls may matter more than a full sandbox runtime.

The right level of isolation depends on what the agent can actually touch.

What sandboxing protects well

Sandboxing is strong when it limits:

filesystem scope,
network reach,
process execution,
credential exposure,
and the blast radius of bad tool behavior.

It is especially valuable when the model can encounter untrusted input and then choose actions.

What sandboxing does not solve

Sandboxing does not solve:

bad approval policy,
broad business permissions,
weak audit logs,
unsafe user-scoped authority,
or a workflow that should never have been autonomous.

Teams often overestimate sandboxing because it feels concrete. But a sandbox around an overpowered workflow is still an overpowered workflow.

The healthy operating pattern

The healthy production pattern is usually:

narrow permissions first,
sandbox execution second,
approval and escalation for irreversible actions,
logging and evaluation after every important run.

This is why sandboxing belongs inside a broader control plane, not as a standalone safety story.

The practical rule

Use strong sandboxing when the agent can execute, browse, or mutate technical systems directly.

Use lighter isolation when the agent remains in read-only or draft-only lanes and business controls already contain the output.

If the team cannot explain the blast radius of a failed run, the system probably needs stronger isolation than it has.

Implementation checklist

Your sandboxing decision is probably healthy when:

the team can describe exactly which resources the agent can touch;
execution, filesystem, network, and secret boundaries are explicit;
sandboxing is paired with approval and permission design;
the logs can show what happened inside the boundary;
and the team knows which actions should still be impossible even inside the sandbox.

Compare next

Sandboxing, network permissions, and secrets Use this page for the deeper execution-boundary design once sandboxing becomes a real implementation project.

What should an AI agent be allowed to do in production? Use this page when the bigger question is permission scope rather than runtime isolation.

Should AI agents have access to customer data? Use this page when the most sensitive boundary is data scope rather than compute or network scope.

What should happen when an AI agent fails in production? Use this page when the runtime boundary is only one part of a broader production failure plan.

Reader value check

This page should help a reader decide which authority, data access, tool scope, and runtime boundary the agent system should receive. For Should AI agents run in a sandbox?, the page is not finished if it only explains vocabulary. It should change what the team approves, measures, routes, buys, logs, or refuses to automate.

Before applying the guidance, bring tool lists, auth scopes, sandbox limits, customer data classes, audit trails, and examples of unsafe tool output. Those inputs keep the decision anchored in real operating conditions instead of a generic best-practice list.

Check	What the reader should be able to answer
Authority	Does the page distinguish advice, draft, write, delete, payment, and permission-changing actions?
Identity	Is it clear whether the agent acts as a user, service account, or constrained system role?
Runtime boundary	Are tools, network access, files, and secrets scoped to the smallest practical surface?
Auditability	Can the team explain after the fact what the agent saw, decided, and changed?

Use the page as a working review artifact: compare the current workflow against the table, mark the missing evidence, and assign an owner for the next change. If the page exposes a gap but no one owns that gap, the correct next step is not broader rollout; it is a smaller pilot, a clearer gate, or a better measurement loop.

For agent-system pages, the value is a safer architecture decision. The page should help readers reduce hidden authority before they add more tools or autonomy.