LLM Router
HomeRoutersModelsProvidersBenchmarksPricingCompareBlogAbout
HomeModelsBenchmarksPricingCompareBlog
LLM Router

Independent comparison platform for LLM routing infrastructure.

Platform

  • Home
  • Routers
  • Models
  • Pricing
  • Blog
  • About

Routers

  • Requesty
  • OpenRouter
  • Martian
  • Unify
  • LiteLLM

© 2026 LLM Router

Data from public sources. May not reflect real-time pricing.

Providers›Zhipu

Zhipu

↗ Website

Zhipu AI (智谱AI) is a leading Chinese AI company developing the GLM family of language models. Their ChatGLM and GLM-4 models offer strong Chinese and English language capabilities with competitive performance on global benchmarks.

Pricing available from Requesty, OpenRouter, Vercel AI, Martian, DeepInfra.

Total Models
14
Arena Ranked
8
of 14
Open Source
0
Cheapest Input
$0.06
per 1M tokens

$ Pricing Summary(per 1M tokens)

MetricInputOutput
Cheapest$0.06$0.10
Average$0.46$1.66
Most Expensive$1.00$3.20

⚙ Capabilities

👁
Vision
3
of 14 models
🧠
Reasoning
10
of 14 models
🔧
Tool Calling
14
of 14 models
⚡
Prompt Caching
3
of 14 models
🖥
Computer Use
0
of 14 models
🎨
Image Generation
0
of 14 models

🤖 All Zhipu Models(14)

ZhipuGLM 5
#11

GLM 5

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With advanced agentic planning, deep backend reasoning, and iterative self-correction, GLM-5 moves beyond code generation to full-system construction and autonomous execution.

Context
203K
Max Output
131K
Input/1M
$0.80
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
Requesty★$1.00 / $3.20
OpenRouter$0.80 / $2.56
Vercel AI$1.00 / $3.20
Martian$1.00 / $3.20
DeepInfra$0.80 / $2.56
2026-02-11View details →
ZhipuGLM 4
#22

GLM 4.7

GLM-4.7 is Z AI’s latest flagship model, designed to push agentic and coding performance further. It expands the context window from 128K to 200K tokens, improves reasoning and tool-use capabilities, and delivers stronger results in coding benchmarks and real-world development workflows. GLM-4.6 demonstrates refined writing quality, more capable agent behavior, and higher token efficiency (≈15% fewer tokens vs. GLM-4.5). Evaluations show clear gains over GLM-4.5 across reasoning, agents, and coding, reaching near parity with Claude Sonnet 4 in practical tasks while outperforming other open-source baselines. GLM-4.6 is available through the Z.ai API platform, OpenRouter, coding agents (Claude Code, Roo Code, Cline, Kilo Code), and soon as downloadable weights on HuggingFace and ModelScope.

Context
203K
Max Output
128K
Input/1M
$0.40
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
Requesty★$0.60 / $2.20
OpenRouter$0.40 / $1.50
Vercel AI$0.43 / $1.75
Martian$0.40 / $1.50
DeepInfra$0.40 / $1.75
2025-12-22View details →
ZhipuGLM 4
#33

GLM 4.6

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

Context
203K
Max Output
128K
Input/1M
$0.35
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
Requesty★$0.60 / $2.20
OpenRouter$0.35 / $1.50
Vercel AI$0.45 / $1.80
Martian$0.35 / $1.50
DeepInfra$0.43 / $1.74
2025-09-30View details →
ZhipuGLM 4
#57

GLM 4.5

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Context
131K
Max Output
98K
Input/1M
$0.35
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
Requesty★$0.60 / $2.20
OpenRouter$0.35 / $1.55
Vercel AI$0.60 / $2.20
Martian$0.35 / $1.55
2025-07-25View details →
ZhipuGLM 4
#89

Z.ai: GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

Context
131K
Max Output
131K
Input/1M
$0.30
👁 Vision🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.30 / $0.90
Vercel AI$0.30 / $0.90
DeepInfra$0.30 / $0.90
2025-12-08View details →
ZhipuGLM 4
#95

Z.ai: GLM 4.5 Air (free)

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Context
131K
Max Output
96K
Input/1M
Free
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
OpenRouterFree / Free
Vercel AI$0.20 / $1.10
Martian$0.13 / $0.85
2025-07-25View details →
ZhipuGLM 4
#100

Z.ai: GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

Context
203K
Max Output
—
Input/1M
$0.06
🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.06 / $0.40
DeepInfra$0.06 / $0.40
2026-01-19View details →
ZhipuGLM 4
#110

Z.ai: GLM 4.5V

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Context
66K
Max Output
16K
Input/1M
$0.60
👁 Vision🧠 Reasoning🔧 Tools⚡ Cache
Pricing (per 1M tokens)
OpenRouter$0.60 / $1.80
Vercel AI$0.60 / $1.80
2025-08-11View details →
ZhipuGLM 4

GLM-4.6V-Flash

For local deployment and low-latency applications. GLM-4.6V series are Z.ai’s iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales.

Context
128K
Max Output
24K
Input/1M
Free
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
Vercel AIFree / Free
2025-09-30View details →
ZhipuGLM 4

Z.ai: GLM 4 32B

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It is made by the same lab behind the thudm models.

Context
128K
Max Output
—
Input/1M
$0.10
🔧 Tools
Pricing (per 1M tokens)
OpenRouter$0.10 / $0.10
2025-07-24View details →
ZhipuGLM 4

GLM 4.7 FlashX

GLM-4.7-Flash balances high performance with efficiency, making it the perfect lightweight deployment option.

Context
200K
Max Output
128K
Input/1M
$0.06
🧠 Reasoning🔧 Tools
Pricing (per 1M tokens)
Vercel AI$0.06 / $0.40
2025-01-01View details →
ZhipuGLM 4

GLM 4.5

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

Context
131K
Max Output
4K
Input/1M
$0.60
🔧 Tools
Pricing (per 1M tokens)
Requesty★$0.60 / $2.20
View details →
ZhipuGLM 4

Glm 4.6

GLM-4.6 is Z AI’s latest flagship model, designed to push agentic and coding performance further. It expands the context window from 128K to 200K tokens, improves reasoning and tool-use capabilities, and delivers stronger results in coding benchmarks and real-world development workflows. GLM-4.6 demonstrates refined writing quality, more capable agent behavior, and higher token efficiency (≈15% fewer tokens vs. GLM-4.5). Evaluations show clear gains over GLM-4.5 across reasoning, agents, and coding, reaching near parity with Claude Sonnet 4 in practical tasks while outperforming other open-source baselines. GLM-4.6 is available through the Z.ai API platform, OpenRouter, coding agents (Claude Code, Roo Code, Cline, Kilo Code), and soon as downloadable weights on HuggingFace and ModelScope.

Context
205K
Max Output
131K
Input/1M
$0.60
🔧 Tools
Pricing (per 1M tokens)
Requesty★$0.60 / $2.20
View details →
ZhipuGLM 4

GLM 4.5 Air

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

Context
131K
Max Output
4K
Input/1M
$0.20
🔧 Tools
Pricing (per 1M tokens)
Requesty★$0.20 / $1.10
View details →
← Back to all providers