MetaLlama 4Open SourceJan 30, 2025
Llama 4 Maverick 17b 128e Instruct Fp8
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
Context Window
1.0M
tokens
Max Output
1.0M
tokens
Released
—
Arena Rank
—
Output Speed
165
tokens/sec
Time to First Token
480ms
TTFT
Capabilities
👁Vision
🧠Reasoning
🔧Tool Calling
⚡Prompt Caching
🖥Computer Use
🎨Image Generation
Pricing Comparison
| Router | Input / 1M | Output / 1M | Cached Input / 1M |
|---|---|---|---|
| Requesty★ | $0.20 | $0.85 | $0.20 |
Benchmarks
Artificial Analysis
Intelligence IndexArtificial Analysis
58/100Coding IndexArtificial Analysis
52/100Math IndexArtificial Analysis
62/100MMLU-PROArtificial Analysis
0.748/1GPQA DiamondArtificial Analysis
0.585/1MATH-500Artificial Analysis
0.802/1AIME 2024Artificial Analysis
0.18/1LiveCodeBenchArtificial Analysis
0.478/1Model IDs
Requesty
novita/meta-llama/llama-4-maverick-17b-128e-instruct-fp8Related Models
Meta#142
Meta: Llama 4 Maverick
1.0M ctx$0.15/1M in
Meta#149
Meta: Llama 4 Scout
328K ctx$0.08/1M in
Meta#133
Meta: Llama 3.1 405B (base)
33K ctx$0.40/1M in
Meta#154
Meta: Llama 3.3 70B Instruct (free)
128K ctxFree/1M in
Meta#178
NVIDIA: Llama 3.1 Nemotron 70B Instruct
131K ctx$1.20/1M in
Meta#180
Meta: Llama 3.1 70B Instruct
131K ctx$0.40/1M in