AnthropicClaude 3.5Arena #125Oct 22, 2024

Claude 3.5 Sonnet

New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal

Context Window
200K
tokens
Max Output
8K
tokens
Released
Oct 22, 2024
Arena Rank
#125
of 305 models
Output Speed
78
tokens/sec
Time to First Token
850ms
TTFT

Capabilities

👁Vision
🧠Reasoning
🔧Tool Calling
Prompt Caching
🖥Computer Use
🎨Image Generation

Supported Parameters

Max Tokens
Output length limit
Stop Sequences
Custom stop tokens
Temperature
Controls randomness
Tool Choice
Control tool usage
Tool Calling
Function calling support
Top K
Top-K sampling
Top P
Nucleus sampling

Pricing Comparison

RouterInput / 1MOutput / 1MCached Input / 1M
Requesty$3.00$15.00$0.30
OpenRouter$6.00$30.00
Vercel AI$3.00$15.00

Benchmarks

Artificial Analysis
Intelligence IndexArtificial Analysis
56.2/100
Coding IndexArtificial Analysis
55/100
Math IndexArtificial Analysis
56/100
MMLU-PROArtificial Analysis
0.742/1
GPQA DiamondArtificial Analysis
0.592/1
MATH-500Artificial Analysis
0.782/1
AIME 2024Artificial Analysis
0.16/1
LiveCodeBenchArtificial Analysis
0.535/1
SciCodeArtificial Analysis
0.228/1
Aider Code Editing
Aider EditingAider Code Editing
57.1/100
Aider EditingAider Code Editing
69.2/100
Aider Polyglot
Aider PolyglotAider Polyglot
22.2/100

Model IDs

Requestyvertex/claude-3-5-sonnet@europe-west1
OpenRouteranthropic/claude-3.5-sonnet

Tags

visiontool-callingcachingcomputer-use

Available Regions

EUUS
Compare with another model

Related Models