All Models Comparison
| Model | Per Request | Daily 100 req |
Monthly | Context | Value Score | Relative cost | Try it |
|---|---|---|---|---|---|---|---|
| Loading models… | |||||||
📊 Monthly Cost Forecast
Understanding LLM API Costs
What is a token?
A token is roughly 4 characters or 0.75 words in English. "Hello world" is about 3 tokens. Most models price input and output tokens separately — output tokens typically cost more since the model generates each one.
Why are input and output priced differently?
Reading (input) is computationally cheaper than writing (output). Models process your input in parallel, but generate output sequentially — token by token. That's why output tokens cost 3–6× more per token on most APIs.
How accurate are these estimates?
Prices reflect official API pricing as of April 2026. LLM costs have dropped 80%+ since 2025 — models that cost $5–10/M tokens in 2025 now cost under $0.30/M. Prices vary by tier, volume, and region. Batch API discounts (50%) and prompt caching discounts (90%) are not included here.
Which model is best value in 2026?
Budget: Gemini 2.5 Flash-Lite ($0.10/M), DeepSeek V3.2 ($0.14/M), or Llama 3.3 70B ($0.10/M). Production: Claude Sonnet 4.6 ($3/$15) or GPT-4.1 ($2/$8). Reasoning at low cost: DeepSeek R1 ($0.55/M) beats o1/o3 by 80%+. New entrants DeepSeek and Grok Fast have reset the price floor.
Does context length affect cost?
Yes — every token in your context window counts as input. A 10,000-token conversation history adds 10K input tokens to every API call. Use context management techniques (summarization, truncation) to control costs at scale.
How do I reduce my API costs?
Use prompt caching (Anthropic offers 90% discount on cached input tokens). Batch similar requests. Choose the smallest model that fits your quality bar. Truncate conversation history. Use streaming to detect when to stop early.