Skip to content

Cost per success and tool economics for agentic products

The real cost metric for an agentic product is usually cost per successful task, not cost per model call.

Per-call pricing is still necessary, but it is not enough. If a workflow uses search, retrieval, code execution, browsing, or multi-step orchestration, the team needs to ask:

  • how often the task succeeds,
  • how often it succeeds without human cleanup,
  • and what the whole workflow costs when it does.

That is the only level where tool use can be judged honestly.

Per-call thinking fails because agentic systems do not sell API calls. They sell completed outcomes.

A product can have:

  • a cheap average call cost but terrible completion rate,
  • a low model bill but a heavy managed-tools bill,
  • or a polished agent loop that still loses money once retries, human review, and failure cleanup are included.

This is why a lower token bill can still hide a worse business.

Current official signals checked April 15, 2026

Section titled “Current official signals checked April 15, 2026”
SourceCurrent signalWhy it matters
OpenAI API pricingModel pricing is only one part of runtime economicsTool-connected systems need economics above the token layer
OpenAI API priority processingService tier changes can materially affect runtime spendProduct economics now include routing and service-tier choices
OpenAI tools guidesSearch, file access, and execution are first-class workflow primitivesTool calls should be budgeted as workflow decisions, not hidden implementation details

For most serious products, track these together:

  1. Cost per request
  2. Cost per attempted task
  3. Cost per successful task
  4. Cost per accepted result
  5. Cost per retained user workflow

Only the first number is easy. The third and fourth are where product truth usually appears.

Do not define success as “the model answered.”

Success should usually mean:

  • the correct workflow completed,
  • required evidence or tool output was gathered,
  • the result met policy requirements,
  • and the user did not need to redo or manually repair the work.

Anything weaker inflates apparent efficiency.

Use a simple version first:

total workflow cost / successful workflow completions

Include:

  • model calls,
  • tool calls,
  • retries,
  • background runs,
  • review overhead,
  • and known cleanup or fallback cost when the workflow fails.

Even a rough version of this metric is more useful than a precise token spreadsheet that ignores human rework.

Teams enable search on every request, even when the task is stable or closed-world. That improves demos but inflates cost per success on routine work.

Managed file search or external retrieval is added because it feels architecturally mature, not because it improves successful completion enough to justify the stack.

Execution tools are often used because they make answers look serious. If the user value barely changes, the runtime cost is simply product drag.

Retries, fallbacks, and multiple tool paths can rescue some tasks, but they also raise the cost of every eventual success. The product must decide which rescues are worth buying.

Every major tool path should answer three questions:

  1. What does this tool increase?
  2. What does it cost?
  3. Which user or business outcome justifies the difference?

If the team cannot answer all three, the tool is probably being paid for on hope rather than evidence.

For a target workflow, compare:

  1. model-only baseline,
  2. minimal-tool version,
  3. full-tool version.

Measure:

  • latency,
  • total workflow cost,
  • success rate,
  • accepted-result rate,
  • and need for human intervention.

This usually reveals whether tools are increasing real success or only expensive sophistication.

The right decision is not always “cheaper.”

Sometimes a more expensive tool path is justified because it:

  • increases completion rate enough to lower cost per accepted result,
  • cuts rework sharply,
  • or raises quality enough to support a higher-value use case.

But teams need to prove that at the workflow level, not assume it from model or tool marketing.