Zhipu Models — Pricing, Benchmarks & Capabilities

Metric	Input	Output
Cheapest	$0.06	$0.10
Average	$0.46	$1.66
Most Expensive	$1.00	$3.20

ZhipuGLM 5

#11

GLM 5

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With advanced agentic planning, deep backend reasoning, and iterative self-correction, GLM-5 moves beyond code generation to full-system construction and autonomous execution.

Context

203K

Max Output

131K

Input/1M

$0.80

👁 Vision🧠 Reasoning🔧 Tools⚡ Cache

Pricing (per 1M tokens)

Requesty★$1.00 / $3.20

OpenRouter$0.80 / $2.56

Vercel AI$1.00 / $3.20

Martian$1.00 / $3.20

DeepInfra$0.80 / $2.56

2026-02-11View details →

ZhipuGLM 4

#22

GLM 4.7

GLM-4.7 is Z AI’s latest flagship model, designed to push agentic and coding performance further. It expands the context window from 128K to 200K tokens, improves reasoning and tool-use capabilities, and delivers stronger results in coding benchmarks and real-world development workflows. GLM-4.6 demonstrates refined writing quality, more capable agent behavior, and higher token efficiency (≈15% fewer tokens vs. GLM-4.5). Evaluations show clear gains over GLM-4.5 across reasoning, agents, and coding, reaching near parity with Claude Sonnet 4 in practical tasks while outperforming other open-source baselines. GLM-4.6 is available through the Z.ai API platform, OpenRouter, coding agents (Claude Code, Roo Code, Cline, Kilo Code), and soon as downloadable weights on HuggingFace and ModelScope.

Context

203K

Max Output

128K

Input/1M

$0.40

🧠 Reasoning🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.60 / $2.20

OpenRouter$0.40 / $1.50

Vercel AI$0.43 / $1.75

Martian$0.40 / $1.50

DeepInfra$0.40 / $1.75

2025-12-22View details →

ZhipuGLM 4

#33

GLM 4.6

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

Context

203K

Max Output

128K

Input/1M

$0.35

🧠 Reasoning🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.60 / $2.20

OpenRouter$0.35 / $1.50

Vercel AI$0.45 / $1.80

Martian$0.35 / $1.50

DeepInfra$0.43 / $1.74

2025-09-30View details →

ZhipuGLM 4

#57

GLM 4.5

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Context

131K

Max Output

98K

Input/1M

$0.35

🧠 Reasoning🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.60 / $2.20

OpenRouter$0.35 / $1.55

Vercel AI$0.60 / $2.20

Martian$0.35 / $1.55

2025-07-25View details →

ZhipuGLM 4

#89

Z.ai: GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

Context

131K

Max Output

131K

Input/1M

$0.30

👁 Vision🧠 Reasoning🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.30 / $0.90

Vercel AI$0.30 / $0.90

DeepInfra$0.30 / $0.90

2025-12-08View details →

ZhipuGLM 4

#95

Z.ai: GLM 4.5 Air (free)

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Context

131K

Max Output

96K

Input/1M

Free

🧠 Reasoning🔧 Tools

Pricing (per 1M tokens)

OpenRouterFree / Free

Vercel AI$0.20 / $1.10

Martian$0.13 / $0.85

2025-07-25View details →

ZhipuGLM 4

#100

Z.ai: GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

Context

203K

Max Output

—

Input/1M

$0.06

🧠 Reasoning🔧 Tools⚡ Cache

Pricing (per 1M tokens)

OpenRouter$0.06 / $0.40

DeepInfra$0.06 / $0.40

2026-01-19View details →

ZhipuGLM 4

#110

Z.ai: GLM 4.5V

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Context

66K

Max Output

16K

Input/1M

$0.60

👁 Vision🧠 Reasoning🔧 Tools⚡ Cache

Pricing (per 1M tokens)

OpenRouter$0.60 / $1.80

Vercel AI$0.60 / $1.80

2025-08-11View details →

ZhipuGLM 4

GLM-4.6V-Flash

For local deployment and low-latency applications. GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.

Context

128K

Max Output

24K

Input/1M

Free

🧠 Reasoning🔧 Tools

Pricing (per 1M tokens)

Vercel AIFree / Free

2025-09-30View details →

ZhipuGLM 4

Z.ai: GLM 4 32B

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It is made by the same lab behind the thudm models.

Context

128K

Max Output

—

Input/1M

$0.10

🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.10 / $0.10

2025-07-24View details →

ZhipuGLM 4

GLM 4.7 FlashX

GLM-4.7-Flash balances high performance with efficiency, making it the perfect lightweight deployment option.

Context

200K

Max Output

128K

Input/1M

$0.06

🧠 Reasoning🔧 Tools

Pricing (per 1M tokens)

Vercel AI$0.06 / $0.40

2025-01-01View details →

ZhipuGLM 4

GLM 4.5

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

Context

131K

Max Output

4K

Input/1M

$0.60

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.60 / $2.20

View details →

ZhipuGLM 4

Glm 4.6

GLM-4.6 is Z AI’s latest flagship model, designed to push agentic and coding performance further. It expands the context window from 128K to 200K tokens, improves reasoning and tool-use capabilities, and delivers stronger results in coding benchmarks and real-world development workflows. GLM-4.6 demonstrates refined writing quality, more capable agent behavior, and higher token efficiency (≈15% fewer tokens vs. GLM-4.5). Evaluations show clear gains over GLM-4.5 across reasoning, agents, and coding, reaching near parity with Claude Sonnet 4 in practical tasks while outperforming other open-source baselines. GLM-4.6 is available through the Z.ai API platform, OpenRouter, coding agents (Claude Code, Roo Code, Cline, Kilo Code), and soon as downloadable weights on HuggingFace and ModelScope.

Context

205K

Max Output

131K

Input/1M

$0.60

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.60 / $2.20

View details →

ZhipuGLM 4

GLM 4.5 Air

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

Context

131K

Max Output

4K

Input/1M

$0.20

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.20 / $1.10

View details →

Zhipu

$ Pricing Summary(per 1M tokens)

⚙ Capabilities

🤖 All Zhipu Models(14)

GLM 5

GLM 4.7

GLM 4.6

GLM 4.5

Z.ai: GLM 4.6V

Z.ai: GLM 4.5 Air (free)

Z.ai: GLM 4.7 Flash

Z.ai: GLM 4.5V

GLM-4.6V-Flash

Z.ai: GLM 4 32B

GLM 4.7 FlashX

GLM 4.5

Glm 4.6

GLM 4.5 Air

Zhipu

$ Pricing Summary(per 1M tokens)

⚙ Capabilities

🤖 All Zhipu Models(14)

GLM 5

GLM 4.7

GLM 4.6

GLM 4.5

Z.ai: GLM 4.6V

Z.ai: GLM 4.5 Air (free)

Z.ai: GLM 4.7 Flash

Z.ai: GLM 4.5V

GLM-4.6V-Flash

Z.ai: GLM 4 32B

GLM 4.7 FlashX

GLM 4.5

Glm 4.6

GLM 4.5 Air