LLM Router
HomeRoutersModelsProvidersBenchmarksPricingCompareBlogAbout
HomeModelsBenchmarksPricingCompareBlog
LLM Router

Independent comparison platform for LLM routing infrastructure.

Platform

  • Home
  • Routers
  • Models
  • Pricing
  • Blog
  • About

Routers

  • Requesty
  • OpenRouter
  • Martian
  • Unify
  • LiteLLM

© 2026 LLM Router

Data from public sources. May not reflect real-time pricing.

Model Repository

Live pricing, benchmarks, and capabilities for 455+ LLM models from every major provider. Data refreshed every 6 hours.

Requesty392 models
OpenRouter340 models
Vercel AI Gateway193 models
Martian289 models
DeepInfra75 models
Provider
Capabilities
Region
Sort
455 of 455 models
MiniMaxMiniMaxOSS

MiniMax M2.5

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.

Context
205K
Max Output
131K
Input/1M
$0.30
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
Requesty★$0.30 / $1.20
OpenRouter$0.30 / $1.20
Vercel AI$0.30 / $1.20
2026-02-12View details →
ZhipuGLM 5
#11

GLM 5

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With advanced agentic planning, deep backend reasoning, and iterative self-correction, GLM-5 moves beyond code generation to full-system construction and autonomous execution.

Context
203K
Max Output
131K
Input/1M
$0.80
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
Requesty★$1.00 / $3.20
OpenRouter$0.80 / $2.56
Vercel AI$1.00 / $3.20
Martian$1.00 / $3.20
DeepInfra$0.80 / $2.56
2026-02-11View details →
QwenQwen 3OSS

Qwen: Qwen3 Max Thinking

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it delivers major gains in factual accuracy, complex reasoning, instruction following, alignment with human preferences, and agentic behavior.

Context
262K
Max Output
66K
Input/1M
$1.20
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouter$1.20 / $6.00
Vercel AI$1.20 / $6.00
Martian$1.20 / $6.00
DeepInfra$1.20 / $6.00
2026-02-09View details →
Other

Aurora Alpha

This is a cloaked model provided to the community to gather feedback. A reasoning model designed for speed. It is built for coding assistants, real-time conversational applications, and agentic workflows. Default reasoning effort is set to medium for fast responses. For agentic coding use cases, we recommend changing effort to high. Note: All prompts and completions for this model are logged by the provider and may be used to improve the model.

Context
128K
Max Output
50K
Input/1M
Free
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouterFree / Free
2026-02-09View details →
AnthropicClaude 4.6
#2

Claude Opus 4.6

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time. The model shows deeper contextual understanding, stronger problem decomposition, and greater reliability on hard engineering tasks than prior generations. Beyond coding, Opus 4.6 excels at sustained knowledge work. It produces near-production-ready documents, plans, and analyses in a single pass, and maintains coherence across very long outputs and extended sessions. This makes it a strong default for tasks that require persistence, judgment, and follow-through, such as technical design, migration planning, and end-to-end project execution. For users upgrading from earlier Opus versions, see our [official migration guide here](https://openrouter.ai/docs/guides/guides/model-migrations/claude-4-6-opus)

Context
1.0M
Max Output
128K
Input/1M
$5.00
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache🖥 Computer
EUUS
Pricing (per 1M tokens)
Requesty★$5.00 / $25.00
OpenRouter$5.00 / $25.00
Vercel AI$5.00 / $25.00
Martian$5.00 / $25.00
2026-02-04View details →
QwenQwen 3OSS

Qwen: Qwen3 Coder Next

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per token, delivering performance comparable to models with 10 to 20x higher active compute, which makes it well suited for cost-sensitive, always-on agent deployment. The model is trained with a strong agentic focus and performs reliably on long-horizon coding tasks, complex tool usage, and recovery from execution failures. With a native 256k context window, it integrates cleanly into real-world CLI and IDE environments and adapts well to common agent scaffolds used by modern coding tools. The model operates exclusively in non-thinking mode and does not emit <think> blocks, simplifying integration for production coding agents.

Context
262K
Max Output
66K
Input/1M
$0.07
🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.07 / $0.30
Vercel AI$0.50 / $1.20
Martian$0.07 / $0.30
2026-02-04View details →
Other

Free Models Router

The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that support features needed for your request such as image understanding, tool calling, structured outputs and more.

Context
200K
Max Output
—
Input/1M
Free
👁 Vision🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouterFree / Free
2026-02-01View details →
StepFunStep 3
#80

StepFun: Step 3.5 Flash (free)

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.

Context
256K
Max Output
256K
Input/1M
Free
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouterFree / Free
Martian$0.10 / $0.30
2026-01-29View details →
xAI

Grok Imagine Image

Generate high-quality images from text prompts with xAI's imagine API.

Context
0
Max Output
—
Input/1M
Free
Pricing (per 1M tokens)
Vercel AIFree / Free
2026-01-28View details →
xAI

Grok Imagine Image Pro

Generate high-quality images from text prompts with xAI's imagine API.

Context
0
Max Output
—
Input/1M
Free
Pricing (per 1M tokens)
Vercel AIFree / Free
2026-01-28View details →
Moonshot AIKimi K2.5

Kimi K2.5

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens, it delivers strong performance in general reasoning, visual coding, and agentic tool-calling.

Context
262K
Max Output
262K
Input/1M
$0.45
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
Requesty★$0.60 / $3.00
OpenRouter$0.45 / $2.25
Vercel AI$0.50 / $2.80
Martian$0.45 / $2.25
DeepInfra$0.45 / $2.25
2026-01-27View details →
Other

Arcee AI: Trinity Large Preview (free)

Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing. It excels in creative writing, storytelling, role-play, chat scenarios, and real-time voice assistance, better than your average reasoning model usually can. But we’re also introducing some of our newer agentic performance. It was trained to navigate well in agent harnesses like OpenCode, Cline, and Kilo Code, and to handle complex toolchains and long, constraint-filled prompts. The architecture natively supports very long context windows up to 512k tokens, with the Preview API currently served at 128k context using 8-bit quantization for practical deployment. Trinity-Large-Preview reflects Arcee’s efficiency-first design philosophy, offering a production-oriented frontier model with open weights and permissive licensing suitable for real-world applications and experimentation.

Context
131K
Max Output
—
Input/1M
Free
🔧 Tools
Pricing (per 1M tokens)
OpenRouterFree / Free
Vercel AI$0.25 / $1.00
2026-01-27View details →
OtherOSS

Upstage: Solar Pro 3 (free)

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized for Korean with English and Japanese support.

Context
128K
Max Output
—
Input/1M
Free
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouterFree / Free
2026-01-27View details →
MiniMaxMiniMaxOSS

MiniMax: MiniMax M2-her

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message roles (user_system, group, sample_message_user, sample_message_ai) and can learn from example dialogue to better match the style and pacing of your scenario, making it a strong choice for storytelling, companions, and conversational experiences where natural flow and vivid interaction matter most.

Context
66K
Max Output
2K
Input/1M
$0.30
⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.30 / $1.20
Martian$0.30 / $1.20
2026-01-23View details →
Writer

Writer: Palmyra X5

Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and efficiency on context windows up to 1 million tokens, powered by a novel transformer architecture and hybrid attention mechanisms. This enables faster inference and expanded memory for processing large volumes of enterprise data, critical for scaling AI agents.

Context
1.0M
Max Output
8K
Input/1M
$0.60
Pricing (per 1M tokens)
OpenRouter$0.60 / $6.00
Martian$0.60 / $6.00
2026-01-21View details →
Other

LiquidAI: LFM2.5-1.2B-Thinking (free)

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is designed to provide higher-quality “thinking” responses in a small 1.2B model.

Context
33K
Max Output
—
Input/1M
Free
🧠 Reasoning
Pricing (per 1M tokens)
OpenRouterFree / Free
2026-01-20View details →
Other

LiquidAI: LFM2.5-1.2B-Instruct (free)

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.

Context
33K
Max Output
—
Input/1M
Free
Pricing (per 1M tokens)
OpenRouterFree / Free
2026-01-20View details →
OpenAI

OpenAI: GPT Audio

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced at $32 per million input tokens and $64 per million output tokens.

Context
128K
Max Output
16K
Input/1M
$2.50
Pricing (per 1M tokens)
OpenRouter$2.50 / $10.00
2026-01-19View details →
OpenAI

OpenAI: GPT Audio Mini

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million tokens and output is priced at $2.40 per million tokens.

Context
128K
Max Output
16K
Input/1M
$0.60
Pricing (per 1M tokens)
OpenRouter$0.60 / $2.40
2026-01-19View details →
ZhipuGLM 4
#100

Z.ai: GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

Context
203K
Max Output
—
Input/1M
$0.06
🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.06 / $0.40
DeepInfra$0.06 / $0.40
2026-01-19View details →
OpenAIGPT-5.2

GPT 5.2 Codex

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1-Codex, 5.2-Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.

Context
400K
Max Output
128K
Input/1M
$1.75
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
Requesty★$1.75 / $14.00
OpenRouter$1.75 / $14.00
Vercel AI$1.75 / $14.00
Martian$1.75 / $14.00
2026-01-14View details →
OtherOSS

AllenAI: Molmo2 8B

Molmo2-8B is an open vision-language model developed by the Allen Institute for AI (Ai2) as part of the Molmo2 family, supporting image, video, and multi-image understanding and grounding. It is based on Qwen3-8B and uses SigLIP 2 as its vision backbone, outperforming other open-weight, open-data models on short videos, counting, and captioning, while remaining competitive on long-video tasks.

Context
37K
Max Output
37K
Input/1M
$0.20
👁 Vision
Pricing (per 1M tokens)
OpenRouter$0.20 / $0.20
Martian$0.20 / $0.20
2026-01-09View details →
OtherOSS
#139

AllenAI: Olmo 3.1 32B Instruct

Olmo 3.1 32B Instruct is a large-scale, 32-billion-parameter instruction-tuned language model engineered for high-performance conversational AI, multi-turn dialogue, and practical instruction following. As part of the Olmo 3.1 family, this variant emphasizes responsiveness to complex user directions and robust chat interactions while retaining strong capabilities on reasoning and coding benchmarks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Instruct reflects the Olmo initiative’s commitment to openness and transparency.

Context
66K
Max Output
—
Input/1M
$0.20
🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.20 / $0.60
Martian$0.20 / $0.60
DeepInfra$0.20 / $0.60
2026-01-06View details →
Other

ByteDance Seed: Seed 1.6 Flash

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens.

Context
262K
Max Output
33K
Input/1M
$0.07
👁 Vision🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.07 / $0.30
Martian$0.07 / $0.30
2025-12-23View details →
Other

ByteDance Seed: Seed 1.6

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.

Context
262K
Max Output
33K
Input/1M
$0.25
👁 Vision🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.25 / $2.00
Vercel AI$0.25 / $2.00
Martian$0.25 / $2.00
2025-12-23View details →
MiniMaxMiniMaxOSS
#82

MiniMax: MiniMax M2.1

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world capability while maintaining exceptional latency, scalability, and cost efficiency. Compared to its predecessor, M2.1 delivers cleaner, more concise outputs and faster perceived response times. It shows leading multilingual coding performance across major systems and application languages, achieving 49.4% on Multi-SWE-Bench and 72.5% on SWE-Bench Multilingual, and serves as a versatile agent “brain” for IDEs, coding tools, and general-purpose assistance. To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks).

Context
197K
Max Output
—
Input/1M
$0.27
🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.27 / $0.95
Vercel AI$0.30 / $1.20
Martian$0.27 / $0.95
DeepInfra$0.27 / $0.95
2025-12-23View details →
ZhipuGLM 4
#22

GLM 4.7

GLM-4.7 is Z AI’s latest flagship model, designed to push agentic and coding performance further. It expands the context window from 128K to 200K tokens, improves reasoning and tool-use capabilities, and delivers stronger results in coding benchmarks and real-world development workflows. GLM-4.6 demonstrates refined writing quality, more capable agent behavior, and higher token efficiency (≈15% fewer tokens vs. GLM-4.5). Evaluations show clear gains over GLM-4.5 across reasoning, agents, and coding, reaching near parity with Claude Sonnet 4 in practical tasks while outperforming other open-source baselines. GLM-4.6 is available through the Z.ai API platform, OpenRouter, coding agents (Claude Code, Roo Code, Cline, Kilo Code), and soon as downloadable weights on HuggingFace and ModelScope.

Context
203K
Max Output
128K
Input/1M
$0.40
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
Requesty★$0.60 / $2.20
OpenRouter$0.40 / $1.50
Vercel AI$0.43 / $1.75
Martian$0.40 / $1.50
DeepInfra$0.40 / $1.75
2025-12-22View details →
GoogleGemini 3
#5

Gemini 3 Flash Preview

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.

Context
1.0M
Max Output
66K
Input/1M
$0.50
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
Requesty★$0.50 / $3.00
OpenRouter$0.50 / $3.00
Vercel AI$0.50 / $3.00
Martian$0.50 / $3.00
2025-12-17View details →
MistralMistral Small

Mistral: Mistral Small Creative

Mistral Small Creative is an experimental small model designed for creative writing, narrative generation, roleplay and character-driven dialogue, general-purpose instruction following, and conversational agents.

Context
33K
Max Output
—
Input/1M
$0.10
🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.10 / $0.30
Martian$0.10 / $0.30
2025-12-16View details →
OtherOSS
#189

AllenAI: Olmo 3.1 32B Think

Olmo 3.1 32B Think is a large-scale, 32-billion-parameter model designed for deep reasoning, complex multi-step logic, and advanced instruction following. Building on the Olmo 3 series, version 3.1 delivers refined reasoning behavior and stronger performance across demanding evaluations and nuanced conversational tasks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Think continues the Olmo initiative’s commitment to openness, providing full transparency across model weights, code, and training methodology.

Context
66K
Max Output
66K
Input/1M
$0.15
🧠 Reasoning
Pricing (per 1M tokens)
OpenRouter$0.15 / $0.50
Martian$0.15 / $0.50
2025-12-16View details →
Other
#72

Xiaomi: MiMo-V2-Flash

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config).

Context
262K
Max Output
—
Input/1M
$0.09
🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.09 / $0.29
Vercel AI$0.09 / $0.29
Martian$0.09 / $0.29
2025-12-14View details →
NVIDIANemotronOSS

NVIDIA: Nemotron 3 Nano 30B A3B (free)

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security. Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems.

Context
256K
Max Output
—
Input/1M
Free
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouterFree / Free
Vercel AI$0.06 / $0.24
Martian$0.05 / $0.20
DeepInfra$0.05 / $0.20
2025-12-14View details →
OpenAIGPT-5.2
#25

GPT 5.2 Chat

GPT‑5.2 sets a new state of the art across many benchmarks, including GDPval, where it outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations.

Context
128K
Max Output
16K
Input/1M
$1.75
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
US
Pricing (per 1M tokens)
Requesty★$1.75 / $14.00
Vercel AI$1.75 / $14.00
Martian$1.75 / $14.00
2025-12-11View details →
OpenAIGPT-5.2

OpenAI: GPT-5.2 Pro

GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.

Context
400K
Max Output
128K
Input/1M
$21.00
👁 Vision🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouter$21.00 / $168.00
Vercel AI$21.00 / $168.00
Martian$21.00 / $168.00
2025-12-10View details →
Mistral

Mistral: Devstral 2 2512

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license.

Context
262K
Max Output
66K
Input/1M
$0.05
🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.05 / $0.22
2025-12-09View details →
Mistral

Devstral 2

An enterprise-grade text model that excels at using tools to explore codebases, editing multiple files, and powering software engineering agents.

Context
256K
Max Output
256K
Input/1M
Free
🔧 Tools
Pricing (per 1M tokens)
Vercel AIFree / Free
2025-12-09View details →
Other

Relace: Relace Search

The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic multi-step reasoning to produce highly precise results 4x faster than any frontier model. It's designed to serve as a subagent that passes its findings to an "oracle" coding agent, who orchestrates/performs the rest of the coding task. To use relace-search you need to build an appropriate agent harness, and parse the response for relevant information to hand off to the oracle. Read more about it in the [Relace documentation](https://docs.relace.ai/docs/fast-agentic-search/agent).

Context
256K
Max Output
128K
Input/1M
$1.00
🔧 Tools
Pricing (per 1M tokens)
OpenRouter$1.00 / $3.00
Martian$1.00 / $3.00
2025-12-08View details →
ZhipuGLM 4
#89

Z.ai: GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

Context
131K
Max Output
131K
Input/1M
$0.30
👁 Vision🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.30 / $0.90
Vercel AI$0.30 / $0.90
DeepInfra$0.30 / $0.90
2025-12-08View details →
DeepSeekDeepSeek V3.1OSS

Nex AGI: DeepSeek V3.1 Nex N1

DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model designed to highlight agent autonomy, tool use, and real-world productivity. Nex-N1 demonstrates competitive performance across all evaluation scenarios, showing particularly strong results in practical coding and HTML generation tasks.

Context
131K
Max Output
164K
Input/1M
$0.27
🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.27 / $1.00
Martian$0.27 / $1.00
2025-12-08View details →
Other

EssentialAI: Rnj 1 Instruct

Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific reasoning. The model demonstrates strong performance across multiple programming languages, tool-use workflows, and agentic execution environments (e.g., mini-SWE-agent).

Context
33K
Max Output
—
Input/1M
$0.15
Pricing (per 1M tokens)
OpenRouter$0.15 / $0.15
Martian$0.15 / $0.15
2025-12-07View details →
Other

Body Builder (beta)

Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI models, and Body Builder will construct the appropriate API calls. Example: "count to 10 using gemini and opus." This is useful for creating multi-model requests, custom model routers, or programmatic generation of API calls from human descriptions. **BETA NOTICE**: Body Builder is in beta, and currently free. Pricing and functionality may change in the future.

Context
128K
Max Output
—
Input/1M
$Infinity
Pricing (per 1M tokens)
2025-12-05View details →
OpenAIGPT-5.1

OpenAI: GPT-5.1-Codex-Max

GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version of the 5.1 reasoning stack and trained on agentic workflows spanning software engineering, mathematics, and research. GPT-5.1-Codex-Max delivers faster performance, improved reasoning, and higher token efficiency across the development lifecycle.

Context
400K
Max Output
128K
Input/1M
$1.25
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$1.25 / $10.00
Vercel AI$1.25 / $10.00
Martian$1.25 / $10.00
2025-12-04View details →
AmazonNova

Amazon: Nova 2 Lite

Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows.

Context
1.0M
Max Output
66K
Input/1M
$0.30
👁 Vision🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.30 / $2.50
Martian$0.30 / $2.50
2025-12-02View details →
Mistral

Mistral: Ministral 3 14B 2512

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

Context
262K
Max Output
—
Input/1M
$0.20
👁 Vision🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.20 / $0.20
Martian$0.20 / $0.20
2025-12-02View details →
Mistral

Mistral: Ministral 3 8B 2512

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

Context
262K
Max Output
—
Input/1M
$0.15
👁 Vision🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.15 / $0.15
Martian$0.15 / $0.15
2025-12-02View details →
Mistral

Mistral: Ministral 3 3B 2512

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Context
131K
Max Output
—
Input/1M
$0.10
👁 Vision🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.10 / $0.10
Martian$0.10 / $0.10
2025-12-02View details →
MistralMistral Large
#48

Mistral Large 3

Mistral Large 3 2512 is Mistral’s most capable model to date. It has a sparse mixture-of-experts architecture with 41B active parameters (675B total).

Context
256K
Max Output
256K
Input/1M
$0.50
Pricing (per 1M tokens)
Vercel AI$0.50 / $1.50
2025-12-02View details →
MistralMistral Large

Mistral: Mistral Large 3 2512

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.

Context
262K
Max Output
—
Input/1M
$0.50
👁 Vision🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.50 / $1.50
Martian$0.50 / $1.50
2025-12-01View details →
Other

Arcee AI: Trinity Mini (free)

Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows.

Context
131K
Max Output
—
Input/1M
Free
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouterFree / Free
Vercel AI$0.04 / $0.15
Martian$0.04 / $0.15
2025-12-01View details →
DeepSeekDeepSeek V3.2OSS

DeepSeek: DeepSeek V3.2 Speciale

DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance. It builds on DeepSeek Sparse Attention (DSA) for efficient long-context processing, then scales post-training reinforcement learning to push capability beyond the base model. Reported evaluations place Speciale ahead of GPT-5 on difficult reasoning workloads, with proficiency comparable to Gemini-3.0-Pro, while retaining strong coding and tool-use reliability. Like V3.2, it benefits from a large-scale agentic task synthesis pipeline that improves compliance and generalization in interactive environments.

Context
164K
Max Output
66K
Input/1M
$0.27
🧠 Reasoning⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.27 / $0.41
Martian$0.27 / $0.41
2025-12-01View details →
DeepSeekDeepSeek V3.2OSS
#40

DeepSeek: DeepSeek V3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Context
164K
Max Output
66K
Input/1M
$0.25
🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.25 / $0.38
Vercel AI$0.26 / $0.38
Martian$0.25 / $0.38
DeepInfra$0.26 / $0.38
2025-12-01View details →
DeepSeekDeepSeek V3.2OSS
#41

DeepSeek V3.2 Thinking

Thinking mode of DeepSeek V3.2

Context
128K
Max Output
64K
Input/1M
$0.28
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
Vercel AI$0.28 / $0.42
2025-12-01View details →
Mistral
#215

Ministral 14B

Ministral 3 14B is the largest model in the Ministral 3 family, offering state-of-the-art capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. Optimized for local deployment, it delivers high performance across diverse hardware, including local setups.

Context
256K
Max Output
256K
Input/1M
$0.20
Pricing (per 1M tokens)
Vercel AI$0.20 / $0.20
2025-12-01View details →
Other
#106

Prime Intellect: INTELLECT-3

INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL). It offers state-of-the-art performance for its size across math, code, science, and general reasoning, consistently outperforming many larger frontier models. Designed for strong multi-step problem solving, it maintains high accuracy on structured tasks while remaining efficient at inference thanks to its MoE architecture.

Context
131K
Max Output
131K
Input/1M
$0.20
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.20 / $1.10
Vercel AI$0.20 / $1.10
Martian$0.20 / $1.10
2025-11-27View details →
Other

TNG: R1T Chimera

TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter. Characteristics and improvements include: We think that it has a creative and pleasant personality. It has a preliminary EQ-Bench3 value of about 1305. It is quite a bit more intelligent than the original, albeit a slightly slower. It is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated. Tool calling is much improved. TNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1).

Context
164K
Max Output
66K
Input/1M
$0.25
🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.25 / $0.85
2025-11-26View details →
AnthropicClaude 4.5
#7

Claude Opus 4.5

Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and reasoning benchmarks, and improved robustness to prompt injection. The model is designed to operate efficiently across varied effort levels, enabling developers to trade off speed, depth, and token usage depending on task requirements. It comes with a new parameter to control token efficiency, which can be accessed using the OpenRouter Verbosity parameter with low, medium, or high. Opus 4.5 supports advanced tool use, extended context management, and coordinated multi-agent setups, making it well-suited for autonomous research, debugging, multi-step planning, and spreadsheet/browser manipulation. It delivers substantial gains in structured reasoning, execution reliability, and alignment compared to prior Opus generations, while reducing token overhead and improving performance on long-running tasks.

Context
200K
Max Output
64K
Input/1M
$5.00
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache🖥 Computer
EUUS
Pricing (per 1M tokens)
Requesty★$5.00 / $25.00
OpenRouter$5.00 / $25.00
Vercel AI$5.00 / $25.00
Martian$5.00 / $25.00
2025-11-24View details →
OtherOSS
#172

AllenAI: Olmo 3 32B Think

Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenarios. Its capacity enables strong performance on demanding evaluation tasks and highly nuanced conversational reasoning. Developed by Ai2 under the Apache 2.0 license, Olmo 3 32B Think embodies the Olmo initiative’s commitment to openness, offering full transparency across weights, code and training methodology.

Context
66K
Max Output
66K
Input/1M
$0.15
🧠 Reasoning
Pricing (per 1M tokens)
OpenRouter$0.15 / $0.50
Martian$0.15 / $0.50
2025-11-21View details →
OtherOSS

AllenAI: Olmo 3 7B Instruct

Olmo 3 7B Instruct is a supervised instruction-fine-tuned variant of the Olmo 3 7B base model, optimized for instruction-following, question-answering, and natural conversational dialogue. By leveraging high-quality instruction data and an open training pipeline, it delivers strong performance across everyday NLP tasks while remaining accessible and easy to integrate. Developed by Ai2 under the Apache 2.0 license, the model offers a transparent, community-friendly option for instruction-driven applications.

Context
66K
Max Output
66K
Input/1M
$0.10
Pricing (per 1M tokens)
OpenRouter$0.10 / $0.20
Martian$0.10 / $0.20
2025-11-21View details →
OtherOSS

AllenAI: Olmo 3 7B Think

Olmo 3 7B Think is a research-oriented language model in the Olmo family designed for advanced reasoning and instruction-driven tasks. It excels at multi-step problem solving, logical inference, and maintaining coherent conversational context. Developed by Ai2 under the Apache 2.0 license, Olmo 3 7B Think supports transparent, fully open experimentation and provides a lightweight yet capable foundation for academic research and practical NLP workflows.

Context
66K
Max Output
66K
Input/1M
$0.12
🧠 Reasoning
Pricing (per 1M tokens)
OpenRouter$0.12 / $0.20
Martian$0.12 / $0.20
2025-11-21View details →
GoogleGemini 3

Nano Banana Pro (Gemini 3 Pro Image Preview)

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model generates context-rich graphics, from infographics and diagrams to cinematic composites, and can incorporate real-time information via Search grounding. It offers industry-leading text rendering in images (including long passages and multilingual layouts), consistent multi-image blending, and accurate identity preservation across up to five subjects. Nano Banana Pro adds fine-grained creative controls such as localized edits, lighting and focus adjustments, camera transformations, and support for 2K/4K outputs and flexible aspect ratios. It is designed for professional-grade design, product visualization, storyboarding, and complex multi-element compositions while remaining efficient for general image creation workflows.

Context
1.0M
Max Output
33K
Input/1M
$2.00
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
Requesty★$2.00 / $12.00
OpenRouter$2.00 / $12.00
Vercel AI$2.00 / $120.00
2025-11-20View details →