Skip to content

AI Agent Vendor Security Questionnaire for Enterprise Procurement

AI Agent Vendor Security Questionnaire for Enterprise Procurement

Section titled “AI Agent Vendor Security Questionnaire for Enterprise Procurement”

Enterprise AI agent procurement is no longer a generic SaaS security review. A normal vendor may store data, expose an admin console, and integrate with business systems. An AI agent vendor may also read private context, choose tools, call APIs, write records, create code changes, send messages, search files, trigger workflows, and produce outputs that humans trust.

That changes the questionnaire. The buyer needs to understand not only where data is stored, but what the agent can do with that data, whose authority it uses, how decisions are logged, how failures are contained, and whether the product can be evaluated before wider rollout.

Ask AI agent vendors about eight areas: data access, retention, model routing, tool authority, identity and permissions, audit trails, eval and release controls, incident response, and commercial limits. A vendor that cannot explain side-effect boundaries, approval gates, trace evidence, and rollback behavior is not ready for high-consequence workflows, even if the demo looks strong.

AreaBuyer goalRed flag
Data accessKnow exactly what the agent can read”It uses your workspace context” without source-level controls
RetentionKnow what is stored and for how longNo clear deletion, opt-out, or retention boundary
Model routingKnow which models and providers process dataHidden subcontractors or unclear provider routing
Tool authorityKnow what the agent can changeBroad write tools with weak approval controls
IdentityKnow whose permissions the agent usesOne powerful service account for many workflows
Audit trailsReconstruct what happenedFinal answer logs without tool-call evidence
EvalsProve quality before rolloutOnly anecdotal accuracy claims
IncidentsContain and roll back failuresNo customer-visible incident or rollback process

Procurement should treat these as operating questions, not paperwork.

Ask:

  • Which data sources can the agent access by default?
  • Does the agent respect existing user permissions from systems such as docs, code repositories, CRM, helpdesk, or file storage?
  • Can admins restrict access by workspace, repository, customer segment, data class, or tool?
  • Can the agent access attachments, images, transcripts, browser pages, or retrieved files?
  • Is customer data, source code, employee data, financial data, or regulated data handled differently?
  • Can the product run with no training use of customer data?

Stronger answer: the vendor can describe data classes, permission inheritance, admin allowlists, retention, and testable access boundaries.

Weak answer: the vendor says the product is “secure” but cannot show exactly which sources are available to which agent.

Ask:

  • What inputs, outputs, tool calls, traces, embeddings, files, and logs are stored?
  • How long is each artifact retained?
  • Are prompts, outputs, or traces used for model training?
  • Can retention be configured by workspace or workflow?
  • Can a customer delete traces or files?
  • Are there separate controls for debug logs, eval datasets, and production traces?
  • What happens to data routed through third-party model providers?

Agent systems often need traces for evaluation and audit. That does not mean all traces should be retained forever. The vendor should support a deliberate retention model.

Ask:

  • Which model providers are used?
  • Can the customer restrict providers?
  • Is model routing deterministic, policy-based, or vendor-managed?
  • Can the buyer see which model handled a request?
  • Are different data classes routed differently?
  • What happens during provider outage, rate limiting, or fallback?
  • Can a customer pin a model version or review changes before rollout?

This matters because model routing is now part of the data-processing and reliability story.

Ask:

  • Which tools are read-only?
  • Which tools can create drafts?
  • Which tools can write to production systems?
  • Which actions require approval?
  • Are destructive, financial, customer-facing, or code-changing actions separated from low-risk actions?
  • Are tool inputs and outputs typed and logged?
  • Is retry behavior idempotent?
  • Can admins disable a tool immediately?

The highest-risk vendor answer is a broad tool connection with vague assurance that the agent “knows when to ask.”

Ask:

  • Does the agent act as the user, as a service account, or through delegated workflow authority?
  • Can authority differ by tool and action class?
  • Are permissions checked at runtime or only during setup?
  • Can users grant excessive access accidentally?
  • Can admins see which agents have which scopes?
  • Can a terminated user leave active agent permissions behind?

User-scoped authority can reduce blast radius. Service accounts can improve stability. Neither is automatically right. The buyer needs to know which model is used and why.

Ask whether the audit record includes:

  • original user instruction;
  • system and policy context;
  • retrieved sources and files;
  • model route;
  • tool calls;
  • tool inputs and outputs;
  • approval requests and decisions;
  • final output;
  • side effects in external systems;
  • errors, retries, and rollback events.

If the audit trail only stores final messages, the product is hard to govern.

7. Evaluation and release-control questions

Section titled “7. Evaluation and release-control questions”

Ask:

  • Does the vendor provide eval tooling, trace sampling, or quality review workflows?
  • Can the customer build workflow-specific eval datasets?
  • Are model, prompt, and tool changes versioned?
  • Can changes be canaried?
  • Can the customer roll back prompts, tools, model routes, or workflow versions?
  • What quality metrics are available beyond user thumbs-up?
  • Can security or compliance teams review high-risk workflows before release?

Agent quality should be measured at the workflow level. A vendor that only reports answer satisfaction may miss tool failures and unsafe side effects.

Ask:

  • What happens if the agent sends the wrong message, changes the wrong record, leaks data, creates bad code, or loops through tools?
  • Can the vendor help reconstruct the trace?
  • Can a customer disable one workflow without disabling the whole product?
  • Are there incident severity levels?
  • What customer notification commitments exist?
  • How are post-incident fixes turned into evals or guardrails?

Production agents need incident response because failures are operational, not only conversational.

Budget design can create security pressure. Ask:

  • Are premium models, tool calls, search, and trace retention priced separately?
  • Can admins set usage budgets by team, workflow, or risk class?
  • Are overages visible before they become invoices?
  • Does the vendor charge for audit retention, eval runs, or reviewer seats?
  • Can low-risk workflows use cheaper lanes while sensitive workflows use stronger controls?

Unexpected cost often causes teams to disable review, logging, or evals. That is a governance problem.

Use this scoring model:

ScoreMeaning
0Vendor cannot answer clearly
1Vendor has a general policy but no workflow-specific control
2Vendor has configurable controls but weak evidence or logs
3Vendor has clear controls, trace evidence, admin visibility, and rollback

Any vendor scoring 0 on data access, tool authority, identity, or audit trails should not be used for high-consequence agent workflows.