OpenAI Batch API limits, expiration, and output files
OpenAI Batch API limits, expiration, and output files
Section titled “OpenAI Batch API limits, expiration, and output files”The Batch API looks simple from a pricing angle: send deferred work, get lower cost, wait for completion. Production use is more demanding. The team has to design around input files, per-batch limits, queued-token limits, custom_id mapping, output files, error files, partial completion, and expired requests.
The discount is useful only when the batch can be operated cleanly.
Current official signals checked April 24, 2026
Section titled “Current official signals checked April 24, 2026”| Official source | Current signal | Why it matters |
|---|---|---|
| OpenAI Batch guide | Batch is designed for asynchronous groups of requests with a 24-hour completion window and lower cost than synchronous APIs | The product should treat Batch as deferred throughput, not live job execution |
| OpenAI Batch guide | A single batch may include up to 50,000 requests and the input file can be up to 200 MB | Large jobs still need sizing, sharding, and retry planning |
| OpenAI Batch API reference | The batch request uses a JSONL file uploaded for the batch purpose and currently supports a 24h completion window | Teams should design around file-based submission and delayed retrieval |
| OpenAI rate limits guide | Batch queue limits are based on enqueued input tokens for each model | Batch capacity planning is not only request count; queued tokens matter |
The production boundary
Section titled “The production boundary”Use Batch when the work is:
- independent by row or record;
- not user-waiting;
- valuable even if it finishes hours later;
- retryable by request or shard;
- easy to map back to source records through a stable identifier.
Do not use Batch when a single user or workflow is waiting on one tracked result. That is usually a background-mode or product-job problem.
The limits that should change your design
Section titled “The limits that should change your design”Request and file limits
Section titled “Request and file limits”Batch work is submitted as a file. That means the product has to prepare a request file, upload it, start the batch, and later reconcile results.
Practical design implications:
- use one input line per independent unit of work;
- give every line a stable
custom_id; - shard large jobs by customer, dataset, job type, or run date;
- keep source records immutable enough that results can be reconciled later;
- avoid creating one huge batch that is impossible to reason about after partial failure.
Queued-token limits
Section titled “Queued-token limits”Batch capacity is not just “how many rows can I send?” Queued input tokens for a model matter. A small number of very large prompts can exhaust queue capacity faster than many small jobs.
Before scaling Batch, estimate:
- average input tokens per request;
- high-percentile input size;
- expected output size;
- batches in flight per model;
- whether embeddings or responses are the limiting lane.
This matters for teams using Batch for evals, enrichment, or document processing, where the long tail of prompt size can dominate planning.
What expired batches mean
Section titled “What expired batches mean”An expired batch is not the same as a total loss. OpenAI’s Batch documentation states that unfinished requests are cancelled when a batch expires, while completed responses remain available through the output file and expired requests are represented in the error file.
That creates an important operating rule:
Batch retry should happen at the request level, not blindly at the whole batch level.
If the team replays the whole file after an expiration, completed requests may be duplicated, costs may rise, and downstream records may be overwritten incorrectly.
Output-file reconciliation
Section titled “Output-file reconciliation”Every batch job should have a reconciliation step:
- download the output file;
- parse successful responses by
custom_id; - parse the error file by
custom_id; - mark each source record as completed, failed, expired, or retryable;
- create a retry batch only for retryable failed or expired records;
- preserve the original batch ID and retry batch ID for audit.
The custom_id is the bridge between the provider output and your internal source record. Treat it as a real production identifier, not a throwaway label.
Good batch candidates
Section titled “Good batch candidates”Strong candidates:
- evaluation sweeps;
- large-scale support-ticket classification;
- document tagging;
- transcript cleanup;
- content enrichment;
- embedding or metadata refresh jobs;
- offline moderation review over historical data.
Weak candidates:
- one customer-facing task;
- one long research report;
- anything that needs live approval;
- anything where the user expects visible progress;
- work that changes records directly before review.
Retry and idempotency rules
Section titled “Retry and idempotency rules”Batch retries should be boring. A healthy design has:
- stable
custom_idvalues; - idempotent output writes;
- a source-record status field;
- a maximum retry count;
- a retry reason;
- separate handling for expired, malformed, rate-limited, and policy-blocked items.
If output writes are not idempotent, Batch will create operational risk once jobs are retried or partially replayed.