OpenAI Request Builder
Build and preview OpenAI Chat Completions API requests with code generation
Configuration
{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello! How can you help me today?"
}
],
"temperature": 1,
"max_tokens": 1024,
"top_p": 1,
"stream": false
}Quick Tips
- • Set
OPENAI_API_KEYenvironment variable before running - • Temperature 0 = deterministic, 2 = maximum randomness
- • Use JSON mode when you need structured output
Related Tools
Rate Limit Calculator
Calculate allowed requests and tokens per minute based on tier limits
AI Response Parser
Parse and visualize complex JSON responses from LLM APIs
Retry Strategy Generator
Generate exponential backoff and retry logic code for robust API calls
Streaming Response Simulator
Simulate and test UI handling of streaming LLM responses (SSE)
Tool Definition Generator
Generate standardized tool definitions for AI agents from code
Anthropic API Builder
Build and test Anthropic Claude API requests with proper formatting
OpenAI Chat Completions API: Complete Request Builder Guide
The OpenAI Chat Completions API is the primary interface for accessing GPT-4o, GPT-4, GPT-3.5-Turbo, and other OpenAI models. This request builder helps you configure API calls visually and generates ready-to-use code in multiple languages.
Whether you're prototyping a new feature, testing different parameters, or learning the API, this tool eliminates the guesswork and helps you build correct API requests quickly.
Understanding Each Parameter
Model
The model determines capabilities, speed, and cost. GPT-4o is the recommended default for most use cases — it's fast, capable, and cost-effective. Use GPT-4o Mini for simpler tasks at lower cost, or o1 for complex reasoning.
Temperature (0-2)
Controls randomness. 0 = deterministic (same input → same output), 1 = balanced, 2 = highly creative. Use 0-0.3 for factual/coding tasks, 0.7-1.0 for creative writing.
Max Tokens
Maximum number of tokens in the response. 1 token ≈ 4 characters in English. This caps cost and response length. Set based on expected output size + buffer.
Top P (Nucleus Sampling)
Alternative to temperature for controlling randomness. Top P = 0.1 means only top 10% probability tokens are considered. Generally, adjust temperature OR top_p, not both.
Stream
When enabled, the response is sent incrementally as it's generated. Essential for chat UIs to display typing indicators and reduce perceived latency.
Response Format
JSON Object mode ensures the response is valid JSON. Use it when you need structured output for parsing. Always include instructions in the prompt about the expected JSON structure.
Frequently Asked Questions
Which model should I use?
Start with GPT-4o — it's the best balance of capability, speed, and cost. Use GPT-4o Mini for high-volume, simpler tasks. Use o1 or o1-mini for complex reasoning, math, or coding challenges.
How do I get my API key?
Go to platform.openai.com, sign in, navigate to API Keys in your account settings, and create a new secret key. Store it securely — you won't be able to see it again.
What's the difference between temperature and top_p?
Both control randomness but differently. Temperature scales all probabilities; top_p cuts off low-probability options. OpenAI recommends adjusting only one at a time. Temperature is more intuitive for most use cases.
How do I handle streaming responses?
With streaming, the response comes as Server-Sent Events (SSE). Each chunk contains a delta with partial content. Concatenate all deltas to build the complete response. The generated code shows the pattern for both Python and Node.js.
How is the cost calculated?
Cost = (input tokens × input price) + (output tokens × output price). Different models have different prices. GPT-4o is ~$2.50/M input, $10/M output. GPT-4o Mini is ~$0.15/M input, $0.60/M output.
