AI Model Pricing Table
Complete pricing reference for all major AI models
Related Tools
AI System Status Board
Aggregated status page for OpenAI, Anthropic, Google, and other AI services
AI Model Release Timeline
Interactive timeline of major LLM and generative AI model releases
LLM Benchmark Library
Compare LLM performance across standard benchmarks like MMLU, GSM8K, and HumanEval
Model Capability Matrix
Compare feature support (vision, function calling, json mode) across major LLMs
LLM Head-to-Head
Directly compare two models on specs, pricing, and capabilities side-by-side
Context Window Visualizer
Visual comparison of context window sizes across different models
Understanding AI Model Pricing
AI models charge based on token usage — the fundamental unit of text processing. A token is roughly 3-4 characters or about 0.75 words. Pricing is typically split between input tokens (your prompt and context) and output tokens (the model's response), with output usually costing more due to the computational expense of generation.
Prices shown in this table are per 1 million tokens. For reference, 1 million tokens is roughly 750,000 words or about 1,500 pages of text — far more than most individual API calls will use.
How AI Pricing Works
Input Tokens
Everything you send TO the model: your prompt, system instructions, conversation history, and any documents or context. Usually cheaper than output.
Output Tokens
Everything the model generates: responses, completions, code, analysis. Typically 2-4x more expensive than input due to generation costs.
Context Window
The maximum combined input+output the model can handle. Larger windows (128K+) enable processing longer documents but may cost more.
Rate Limits
Providers limit requests per minute (RPM) and tokens per minute (TPM). Higher tiers unlock greater throughput for production use.
Cost Optimization Strategies
Model Selection by Use Case
| Use Case | Recommended | Why |
|---|---|---|
| Simple chatbot | GPT-4o-mini, Haiku | Low cost, fast responses |
| Code generation | Claude 3.5 Sonnet | Best coding performance |
| Document analysis | Gemini 1.5 Pro | 1M+ token context |
| Complex reasoning | GPT-4o, o1 | Advanced reasoning |
| Image understanding | GPT-4o, Claude 3 | Vision capabilities |
| High volume | Gemini Flash, Haiku | Cheapest per token |
Pro Tip: Monitoring Your Costs
Set up billing alerts in your provider dashboard to avoid unexpected charges. Most providers allow you to configure email notifications when you reach certain spending thresholds. OpenAI, Anthropic, and Google all support usage limits and spending caps that can hard-stop API access once reached.
Cost Calculation Example
Here's how to calculate the cost of an API call. Using GPT-4o as an example with an input price of $2.50/1M tokens and output price of $10.00/1M tokens:
# Example API Call Cost Breakdown
# --------------------------------
Input tokens: 1,500 (prompt + system message + context)
Output tokens: 500 (model's response)
# Cost Formula
Input cost: (1,500 / 1,000,000) × $2.50 = $0.00375
Output cost: ( 500 / 1,000,000) × $10.00 = $0.00500
──────────────────────────────────────────────────────
Total cost per request: = $0.00875
# Monthly Estimate (10,000 requests)
Monthly cost: $0.00875 × 10,000 = $87.50API Usage Example
Here's an example using curl to make a cost-effective API request with proper token limits:
# Make an API request with token limits
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"max_tokens": 500,
"messages": [
{"role": "system", "content": "Be concise."},
{"role": "user", "content": "Summarize..."}
]
}'
# Response includes token usage for cost tracking:
# "usage": {"prompt_tokens": 15, "completion_tokens": 150}Important: Pricing Data Disclaimer
AI model pricing changes frequently as providers adjust rates. While we fetch live data and update regularly, always verify pricing with official provider documentation before making budget decisions. Enterprise customers should contact providers directly for volume discounts and custom pricing agreements.
Frequently Asked Questions
How do I estimate my monthly costs?
Use this formula: (average input tokens × input price) + (average output tokens × output price) × requests per month. Our Cost Calculator tool can help with precise estimates based on your usage patterns.
Are there volume discounts available?
Yes! Most providers offer committed use discounts (10-30% off for prepaid usage), enterprise agreements, and batch API discounts (typically 50% off). Contact providers directly for enterprise pricing.
Why is output more expensive than input?
Generating tokens requires running the full model forward pass for each token, while processing input tokens can be done in parallel. The sequential nature of generation is computationally more expensive.
How often do prices change?
AI model prices have generally decreased over time as efficiency improves. Major providers typically announce price changes quarterly. We update our data regularly, but always verify critical pricing with the official provider documentation.
What about free tiers?
Many providers offer free tiers for development: OpenAI gives $5-18 in credits for new accounts, Google offers Gemini free tier, and platforms like OpenRouter and Groq provide free access to certain models with rate limits.
How do I count tokens accurately?
Different models use different tokenizers. Use our Token Counter tool for precise counts, or the provider's tokenizer (like tiktoken for OpenAI). As a rough estimate: 1 token ≈ 4 characters in English.
Related Tools
Token Counter
Count exact tokens in your text for accurate cost estimation.
Model Comparison
Compare models side-by-side across pricing and capabilities.
Context Windows
Find models with the context length for your documents.
Benchmark Viewer
Compare model quality alongside pricing for cost-effectiveness.
