OpenAI APIs

July 27, 2025 · 5 min read

High Performance Developer

Comparisons of the OpenAI service offering with that of Anthropic. Includes context window, rate limits and model optimization.

openAI

OpenAI API

Rate Limits

The rate limits are different when compared to Anthropic's but are also based on paid for usage tiers (cumulative billables):

RPM - Requests per minute
RPD - Requests per day
TPM - Tokens per minute
TPD - Tokens per day

We see that there is no difference between input and output tokens for rate limiting, but now we have a daily limit alongside a per minute limit.

Rate limits also vary by model (and can be shared by model family). There are also different limits for long context requests.

Flex Processing

For lower priority tasks - lower costs for requests vs. slower response time and resource unavailability.

You can enable this by setting service_tier="flex" in the request.

Structured Output

The OpenAI API provides support for a more structured output via a text_format field on the request (or response_format if you're using their chat completions API).

alt text Source: OpenAI

The approach with the Anthropic API is to provide a user prompt which guides the output format:

"You’re a Customer Insights AI. Analyze this feedback and output in JSON format with keys: “sentiment” (positive/negative/neutral), “key_issues” (list), and “action_items” (list of dicts with “team” and “task”)."

Background Mode

OpenAI supports async/background mode where you can poll for results - you set a field called background=True. From the response you get an Id which you can use to poll for the result.

This kind of behavior is achieved in Claude using batching, which has it's own rate limits.

Webhooks

You can handle callbacks based on event types:

alt text Source: OpenAI

Batching

The way OpenAI support batched requests is via crafting and then uploading a file via their files API. This input file is then references when making the actual request to create the batch.

Just like with Anthropic, the rate limits for the batch API are different from their other per-model rate limited APIs.

Streaming

The OpenAI API supports streaming over SSE by specifying a stream=True field on requests.

There is a more fine grained breakdown of events than with Anthropic's SSE:

alt text Source: OpenAI

Reasoning/Thinking

OpenAI provide models with reasoning abilities. This is similar to the extended thinking feature in Claude.

The feature is enabled by setting a reasoning= field in the request:

alt text Source: OpenAI

Reasoning tokens - used by models to think by considering multiple approaches. Reasoning tokens are discarded from the model's context, but billed as output tokens.

You need to adjust your max_output_tokens to handle reasoning - you can get a response with a status of incomplete because the context window allocated is too small to handle the additional tokens being generated for reasoning.

Just like with Claude, you should be passing back the reasoning or thinking steps when you're using tools to allow for better continued reasoning. This can be done using the previous_response_id if you're using the Responses API.

Reasoning tokens may be discarded, but you can get a human readable summary by specifying it with the reasoning details:

alt text Source: OpenAI

In the response back, you'll now get a summary field:

alt text

You can see why this can consume so many tokens:

"text": "Answering a simple question\n\nI\u2019m looking at a straightforward question: the capital of France is Paris. It\u2019s a well-known fact, and I want to keep it brief and to the point. Paris is known for its history, art, and culture, so it might be nice to add just a hint of that charm. But mostly, I\u2019ll aim to focus on delivering a clear and direct answer, ensuring the user gets what they\u2019re looking for without any extra fluff."

Context Window and State

The chat completions API offers the traditional stateless method of handling context - you as the calling client accumulate turns and provide them to the model via an API call.

The Responses API offers a stateful methodology - instead of returning previous context, you set a previous_response_id with that id returned in the previous request to the responses API.

This allows for ID chaining (but introduces server side state) in order to not include multiple turns from client side API calls.

Responses API vs Chat Completions API

These are the main way you can interact with models, like the /messages/ endpoint with Claude.

The responses API was developed after the chat completions API in order to simplify the ability to do more agentic tasks and use MCP.

alt text Source: OpenAI

Responses are retained for 30 days by default which allows users to be able to get previous responses using an API call with the ID of the response we want.

The request has some interesting options like the prompt_cache_key, a safety_identifier for detecting violators of usage policies, enabling the response to be stored with store and some additional output structure configuration via include.

System Prompts

System prompts are provided with an instructions field on the request - if we use ID chaining by setting previous_response_id then the system prompt is not carried over.

Model Optimisation

OpenAI provides a variety of ways to optimise the underlying LLM you're using.

This can allow you to user shorter prompts, less examples, train on proprietary data without using it as part of one/many shots, train smaller models to do something very specific, use less tokens overall and therefore reduce costs.

Anthropic provide some features for model optimisation - Claude 3 Haiku can be fine tunes for use in Amazon Bedrock.

Embeddings

OpenAI provides an embeddings API, whereas Anthropic delegate this to other providers.

Links and Resources

https://platform.openai.com/docs/api-reference/introduction

https://www.anthropic.com/news/fine-tune-claude-3-haiku

https://docs.aws.amazon.com/bedrock/latest/userguide/custom-model-supported.html

OpenAI API​

Rate Limits​

Flex Processing​

Structured Output​

Background Mode​

Webhooks​

Batching​

Streaming​

Reasoning/Thinking​

Context Window and State​

Responses API vs Chat Completions API​

System Prompts​

Model Optimisation​

Embeddings​

Links and Resources​

OpenAI API

Rate Limits

Flex Processing

Structured Output

Background Mode

Webhooks

Batching

Streaming

Reasoning/Thinking

Context Window and State

Responses API vs Chat Completions API

System Prompts

Model Optimisation

Embeddings

Links and Resources