How much does Claude API cost?

Claude Sonnet 4 costs $3.00 per million input tokens and $15.00 per million output tokens. Claude Haiku 3.5 is cheaper at $0.80 input / $4.00 output per million tokens.

What is the cheapest AI API in 2026?

As of April 2026, the cheapest capable AI APIs include Google Gemini Flash Lite ($0.075/$0.30 per million tokens), Mistral Small 3.1 ($0.10/$0.30), and DeepSeek V3 ($0.27/$1.10).

How do I calculate AI API costs?

Multiply your monthly request count by average input tokens, divide by 1,000,000, then multiply by the input price per million tokens. Repeat for output tokens. Sum both to get your monthly cost.

Does OpenAI offer batch API discounts?

Yes. OpenAI's Batch API offers 50% off standard pricing. Anthropic also offers 50% batch discounts. Our calculator applies these automatically in Batch Mode.

What is prompt caching and how does it reduce costs?

Prompt caching stores repeated context server-side. Cached tokens are re-used instead of re-processed, costing 50-90% less than standard input tokens depending on the provider.

Prices updated April 2026

AI API Cost Calculator

Estimate your monthly AI API spend across OpenAI, Anthropic Claude, Google Gemini, Mistral, DeepSeek and more. Compare 15+ models instantly — free, no signup required.

✦ 15 models · April 2026 pricing · Batch + caching support

Use Case Preset

Monthly Requests

10,000

Input Tokens per Request prompt + context

500

Output Tokens per Request completion

300

Monthly Requests

Input Tokens per Request

Output Tokens per Request

Prompt Cache Ratio % of input that hits cache

Total Items to Process

Input Tokens per Item

Output Tokens per Item

ℹ Batch discounts applied automatically — OpenAI: 50%, Anthropic: 50%, others: standard rate.

Select Model GPT-4o

Estimated Monthly Cost

$0.00

—

Input tokens/mo—

Output tokens/mo—

Total tokens—

Input cost—

Output cost—

Cache savingsn/a

Per request—

Per 1,000 requests—

Annual estimate—

Output tokens cost 3–5× more than input · Prices from provider docs

Full Model Comparison — Your Workload click any row to select

Get Started with an AI API Provider

OpenAI GPT-4o, o3, GPT-4.1. Industry standard with the widest ecosystem. Open platform → Anthropic Claude Sonnet, Haiku & Opus. Best for long context and coding. Open console → Google AI Studio Gemini 2.5 Pro & Flash. 1M token context, free tier available. Open AI Studio → Mistral AI European AI. Mistral Small at $0.10/M — one of the cheapest capable models. Open console → DeepSeek DeepSeek V3. Extremely cost-effective for high-volume workloads. Open platform → Groq Fastest inference available. Llama 4 and Mixtral at ultra-low latency. Start free →

How to Use This AI API Cost Calculator

Enter your expected monthly request volume, average input tokens per request (your prompt + any context), and average output tokens (the AI's response length). The calculator instantly computes your estimated monthly spend and ranks all 15+ models from cheapest to most expensive for your exact workload.

What Are Input vs Output Tokens?

Tokens are chunks of text — roughly 0.75 words per token in English. Input tokens are everything you send to the model: the system prompt, user message, and any conversation history or retrieved context. Output tokens are the model's response. Output tokens cost significantly more — typically 3–5× the input price — because generating text requires more compute than reading it.

Understanding Prompt Caching

If your prompts include a large, repeated system prompt or static context (common in RAG, agents, and chatbots), prompt caching can reduce input costs by 50–90%. Anthropic Claude and OpenAI both support caching. Use the Advanced tab to model this saving.

When to Use Batch API

If your workload is not latency-sensitive — bulk data processing, evaluation runs, overnight jobs — OpenAI's Batch API and Anthropic's Message Batches both offer 50% off standard pricing. Switch to the Batch Mode tab to calculate your discounted cost.

Which AI API Is the Cheapest in 2026?

For high-volume, cost-sensitive workloads: Google Gemini Flash Lite ($0.075/$0.30 per million tokens) and Mistral Small 3.1 ($0.10/$0.30) are the most affordable capable options. DeepSeek V3 ($0.27/$1.10) is excellent value for complex tasks. For maximum capability, GPT-4.1 and Claude Sonnet 4 offer the best quality-to-cost ratio.

Frequently Asked Questions

GPT-4o costs $2.50 per million input tokens and $10.00 per million output tokens as of April 2026. With prompt caching enabled, cached input tokens drop to $1.25/M. The Batch API reduces both by 50%.

Claude Sonnet 4 costs $3.00/M input and $15.00/M output. Claude Haiku 3.5 is $0.80/$4.00 — ideal for high-volume tasks. Claude Opus 4 is $15.00/$75.00 for maximum capability. All support 50% batch discounts and aggressive prompt caching.

As of April 2026: Google Gemini Flash Lite at $0.075/$0.30, Mistral Small 3.1 at $0.10/$0.30, and DeepSeek V3 at $0.27/$1.10 per million tokens are the cheapest capable options. Google AI Studio also offers a generous free tier on Gemini models.

Formula: (monthly_requests × avg_input_tokens / 1,000,000 × input_price) + (monthly_requests × avg_output_tokens / 1,000,000 × output_price). For example: 10,000 requests × 500 input tokens = 5M input tokens. At GPT-4o pricing ($2.50/M): 5 × $2.50 = $12.50 for input alone.

OpenAI does not offer a persistent free API tier — you pay per token from the start, though new accounts receive trial credits. Google AI Studio offers the most generous free tier in 2026, with free access to Gemini Flash at moderate rate limits.

GPT-4.1 has a 1M token context window vs GPT-4o's 128k, improved instruction following, and lower pricing ($2.00/$8.00 vs $2.50/$10.00). GPT-4.1 is generally the better choice for new projects in 2026.