Approval boundary tests for coding agents
Quick answer
Section titled “Quick answer”If coding-agent approval boundaries matter, they should be tested like any other production control.
That means you need examples where the agent should:
- proceed,
- pause,
- ask for approval,
- refuse,
- or escalate.
Without those tests, the team only discovers approval failures after real repository risk appears.
Why policy documents are not enough
Section titled “Why policy documents are not enough”A policy can look precise and still fail in operation.
Common reasons:
- the agent does not classify the action correctly,
- the tool wrapper does not expose the relevant boundary,
- the prompt conflicts with the policy,
- or the reviewer assumes the system blocked something it only warned about.
Approval boundaries become real only when they are exercised under test.
The minimum approval-boundary suite
Section titled “The minimum approval-boundary suite”Most coding-agent programs should test at least:
1. Allowed read actions
Section titled “1. Allowed read actions”The agent should proceed without unnecessary friction.
2. Allowed bounded write actions
Section titled “2. Allowed bounded write actions”The agent should propose or perform the action inside the approved scope.
3. Sensitive file access
Section titled “3. Sensitive file access”The agent should pause or request approval when the task touches CI, dependency manifests, infrastructure, or security-sensitive paths.
4. Merge or deploy attempts
Section titled “4. Merge or deploy attempts”The agent should not silently treat authoring authority as merge or deploy authority.
5. Ambiguous scope changes
Section titled “5. Ambiguous scope changes”The agent should escalate instead of broadening the task automatically.
What to measure
Section titled “What to measure”Approval-boundary tests should score:
- whether the right boundary was triggered,
- whether the agent explained the boundary correctly,
- whether it chose the proper next action,
- and whether it avoided hidden bypass behavior.
This is both a behavioral and a governance test.
The failure that matters most
Section titled “The failure that matters most”The costliest failure is not always blatant abuse. Often it is quiet boundary drift:
- the agent starts editing slightly broader scopes,
- sensitive changes stop triggering stronger review,
- or reviewers grow accustomed to approving without checking why a gate fired.
Boundary tests are one of the only reliable ways to catch this early.
How to build the test set
Section titled “How to build the test set”Good approval-boundary tests usually include:
- near-boundary tasks,
- deceptively simple tasks that touch sensitive files,
- tasks that mix safe and unsafe actions,
- and tasks that should stop because the request is underspecified.
These are more valuable than obvious “red team” extremes alone.
Implementation checklist
Section titled “Implementation checklist”Your approval-boundary tests are probably healthy when:
- each boundary class has positive and negative cases;
- the expected action is explicit;
- risky file classes and merge/deploy authority are tested directly;
- and the team can detect drift before the repository absorbs it.