MetaLlama 3.1Open SourceFeb 6, 2025
Meta Llama 3.1 405B Instruct
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
Context Window
131K
tokens
Max Output
—
tokens
Released
—
Arena Rank
—
Output Speed
48
tokens/sec
Time to First Token
1.5s
TTFT
Capabilities
👁Vision
🧠Reasoning
🔧Tool Calling
⚡Prompt Caching
🖥Computer Use
🎨Image Generation
Pricing Comparison
| Router | Input / 1M | Output / 1M | Cached Input / 1M |
|---|---|---|---|
| Requesty★ | $0.80 | $0.80 | $0.80 |
Benchmarks
Artificial Analysis
Intelligence IndexArtificial Analysis
51/100Coding IndexArtificial Analysis
46/100Math IndexArtificial Analysis
54/100MMLU-PROArtificial Analysis
0.682/1GPQA DiamondArtificial Analysis
0.488/1MATH-500Artificial Analysis
0.738/1AIME 2024Artificial Analysis
0.097/1LiveCodeBenchArtificial Analysis
0.398/1SciCodeArtificial Analysis
0.162/1Model IDs
Requesty
deepinfra/meta-llama/Meta-Llama-3.1-405B-InstructRelated Models
Meta#133
Meta: Llama 3.1 405B (base)
33K ctx$0.40/1M in
Meta#178
NVIDIA: Llama 3.1 Nemotron 70B Instruct
131K ctx$1.20/1M in
Meta#180
Meta: Llama 3.1 70B Instruct
131K ctx$0.40/1M in
Meta#232
Meta: Llama 3.1 8B Instruct
16K ctx$0.02/1M in
Meta
Meta Llama 3.1 70B Instruct Turbo
131K ctx$0.88/1M in
Meta
Meta Llama 3.1 8B Instruct Turbo
131K ctx$0.18/1M in