Skip to content

Build Background Processing AI with OpenAI Background Mode

How to build a background processing AI system with OpenAI background mode

Section titled “How to build a background processing AI system with OpenAI background mode”

OpenAI background mode solves only one part of a background processing system: it lets a long-running response continue outside the normal synchronous request path. The product still needs a durable job model around it. Without that model, the API call may be asynchronous, but the user experience, review process, support workflow, and failure recovery are still improvised.

The mistake is to treat background mode as a replacement for job architecture. It is better to treat it as one execution lane inside a product-owned background processing system.

A healthy background AI system has five layers:

  1. Job record: the product creates a durable internal job before it calls the model.
  2. Execution lane: the system starts the OpenAI response with background execution when the work fits that lane.
  3. Status model: the product tracks queued, running, waiting for review, completed, failed, canceled, and expired states in its own language.
  4. Output handling: the system stores the result, evidence, partial artifacts, and reviewer notes instead of only showing final text.
  5. Control layer: users or operators can cancel, approve, retry, escalate, or archive the job.

If one of those layers is missing, the system usually feels unfinished. Users ask where their work went. Support teams cannot explain failures. Engineers cannot tell whether a problem is model latency, tool failure, approval delay, or product orchestration.

Do not rely on the provider response object as your only job database. Keep your own record with at least:

FieldWhy it matters
Internal job IDLets your product own the workflow even if provider IDs change
User, workspace, or account scopeControls visibility, billing, support, and permission checks
Provider response IDLinks your job to the background response
Job typeSeparates research, extraction, enrichment, coding, and review flows
StatusGives users and operators a consistent progress model
Submitted inputs summaryHelps support understand what work was requested without exposing unnecessary content
Completion artifact pointersStores report, JSON output, files, citations, or generated records
Review stateSeparates model completion from trusted completion
Cost and timing metadataSupports cost-per-success and latency budget decisions
Failure reasonTurns retry and escalation into a controlled process

The job table does not need to be complex at first. It needs to be explicit.

The status model should not mirror the API blindly

Section titled “The status model should not mirror the API blindly”

Provider statuses are useful, but product statuses should reflect the user’s workflow.

A practical product-level status model is:

  • queued: the product accepted the work and is waiting to start or waiting on provider scheduling;
  • running: the model or tool chain is active;
  • needs review: the model completed but the result is not yet safe to deliver or act on;
  • completed: the result is available to the user or downstream workflow;
  • failed: the system cannot complete without intervention;
  • canceled: the user or system intentionally stopped the job;
  • expired: the job missed the useful window and should be re-created or converted into a support case.

This matters because a model response can be technically complete while the product job is still not done. For example, a generated customer reply may need approval, a research report may need citation review, and a coding-agent patch may need tests.

OpenAI background mode supports polling the response object, and background streaming can be useful when the client may disconnect but the task should continue. The product decision is not “poll or stream?” It is “what user and operator state do we need to recover after interruption?”

Use polling when:

  • the user does not need live progress;
  • the job can be refreshed from a dashboard;
  • completion is more important than visible token flow;
  • the workflow has review or post-processing steps anyway.

Use background streaming when:

  • partial progress improves trust;
  • the user may leave and return;
  • the interface can reconnect from an event cursor;
  • or the product wants to show a long-running task progressing without blocking completion.

Either way, the product should handle lost connections as normal behavior. A closed browser tab should not mean the work is lost.

Do not wait until after launch to decide whether background jobs can take action. Put each job type into an authority class:

Job classExampleDefault control
Read-only analysissummarize files, inspect tickets, search sourcesno approval unless sensitive data is involved
Draft generationsupport reply, report, proposed patchhuman review before publish or merge
Low-risk writetagging, internal note, non-customer-facing updatepolicy gate or sampled review
External side effectsend message, refund, deploy, delete, purchaseexplicit approval

Background execution makes the approval problem easier to hide. It does not make the approval problem disappear. If the job can create external consequences, completion should mean “ready for decision,” not automatically “safe to execute.”

Failure handling that users can understand

Section titled “Failure handling that users can understand”

Every background job needs a user-facing failure rule. A good failure message should say:

  • whether the job can be retried;
  • whether partial output exists;
  • whether a human is needed;
  • whether the failure was due to input, provider availability, tool failure, policy, or timeout.

Avoid treating all failures as “try again.” Some failures should be retried automatically. Some should be escalated. Some indicate the task design is wrong.

Background mode is not a substitute for:

  • bulk offline processing,
  • full workflow orchestration,
  • data pipelines,
  • durable audit trails,
  • approval systems,
  • or queue priority management.

If the workload is thousands of independent deferred requests, compare it with Batch. If the job spans multiple systems and approval states, treat background mode as one execution step inside a broader workflow.

Before shipping, the team should be able to answer:

  1. What internal job record is created before calling OpenAI?
  2. Which statuses can users see?
  3. How does the product recover after a browser refresh or dropped connection?
  4. Who can cancel a job?
  5. Which job types require review before output is used?
  6. What happens to partial output?
  7. What counts as a retryable failure?
  8. How is cost measured per completed job, not just per API call?
  9. Which jobs belong in Batch instead?
  10. Which jobs should stay interactive because users are actively steering them?