Live data from 5+ benchmark sources

LLM Benchmarks & Leaderboard

Compare 455+ AI models across intelligence, coding, math, arena ELO, and speed. Data from Artificial Analysis, Aider, LMSYS Arena, and Open LLM Leaderboard.

121 models ranked

🧠 Intelligence: Composite intelligence scores, MMLU-PRO, GPQA, and general reasoning

#

Model

Score

GPQA

Provider

Microsoft: Phi 4OSS

Meta: Llama 3 70B InstructOSS

Sao10k: Llama 3 Euryale 70B v2.1

NVIDIA: Llama 3.1 Nemotron 70B InstructOSS

Nous: Hermes 3 70B InstructOSS

WizardLM-2 8x22B

Nous: DeepHermes 3 Mistral 24B Preview

Mistral: Mixtral 8x22B InstructOSS

Google: Gemma 2 27BOSS

Qwen2.5 Coder 32B InstructOSS

Google: Gemma 2 9BOSS

Sao10K: Llama 3 8B Lunaris

Mistral: Mixtral 8x7B InstructOSS

Meta: Llama 3 8B InstructOSS

NeverSleep: Lumimaid v0.2 8BOSS

Mistral: Mistral Nemo

Qwen: Qwen2.5 Coder 7B InstructOSS

Meta: Llama 3.2 3B Instruct (free)OSS

Mistral: Mistral 7B InstructOSS

Mistral: Mistral 7B Instruct v0.3OSS

NousResearch: Hermes 2 Pro - Llama-3 8BOSS

Mistral: Mistral 7B Instruct v0.2OSS

Mistral: Mistral 7B Instruct v0.1OSS

Meta: Llama 3.2 1B InstructOSS

Intelligence Index

O4 Mini Deep Research

Intelligence Index

OpenAI: o4 Mini High

Intelligence Index

Intelligence Index

Intelligence Index

Google: Gemini 2.5 Pro Preview 05-06

Intelligence Index

Intelligence Index

Claude Opus 4.6

Intelligence Index

Claude Opus 4.1

Intelligence Index

Claude Opus 4.5

Intelligence Index

DeepSeek R1 Distill Llama 70BOSS

Intelligence Index

Intelligence Index

DeepSeek R1 0528OSS

Intelligence Index

Claude Sonnet 4.5

Intelligence Index

Deepseek ChatOSS

Intelligence Index

Intelligence Index

Deepseek R1 Distill Llama 70bOSS

Intelligence Index

Deepseek R1 Distill Qwen 32bOSS

Intelligence Index

Deepseek R1 Distill Qwen 14bOSS

Intelligence Index

TNG: DeepSeek R1T2 ChimeraOSS

Intelligence Index

DeepSeek: R1 0528 (free)OSS

Intelligence Index

TNG: DeepSeek R1T ChimeraOSS

Intelligence Index

DeepSeek: R1 Distill Qwen 32BOSS

Intelligence Index

DeepSeek: R1 Distill Llama 70BOSS

Intelligence Index

DeepSeek: R1OSS

Intelligence Index

Intelligence Index

Claude Sonnet 4

Intelligence Index

Gemini 2.5 Flash Lite

Intelligence Index

Gemini 2.5 Flash

Intelligence Index

Gemini 2.5 Flash Image (Nano Banana)

Intelligence Index

Google: Gemini 2.5 Flash Preview 09-2025

Intelligence Index

Google: Gemini 2.5 Flash Lite Preview 09-2025

Intelligence Index

Intelligence Index

xAI: Grok 3 Mini Beta

Intelligence Index

xAI: Grok 3 Beta

Intelligence Index

Grok 3 Fast Beta

Intelligence Index

Grok 3 Mini Fast Beta

Intelligence Index

Intelligence Index

OpenAI: o3 Mini High

Intelligence Index

DeepSeek V3.1OSS

Intelligence Index

Intelligence Index

DeepSeek V3 0324 FastOSS

Intelligence Index

DeepSeek V3 0324OSS

Intelligence Index

Deepseek V3 TurboOSS

Intelligence Index

Deepseek V3 0324OSS

Intelligence Index

Nex AGI: DeepSeek V3.1 Nex N1OSS

Intelligence Index

DeepSeek: DeepSeek V3.2 SpecialeOSS

Intelligence Index

DeepSeek: DeepSeek V3.2OSS

Intelligence Index

DeepSeek: DeepSeek V3.2 ExpOSS

Intelligence Index

DeepSeek: DeepSeek V3.1 Terminus (exacto)OSS

Intelligence Index

DeepSeek V3 0324OSS

Intelligence Index

DeepSeek-V3.1OSS

Intelligence Index

DeepSeek V3.2 ThinkingOSS

Intelligence Index

Llama 4 Maverick 17b 128e Instruct Fp8OSS

Intelligence Index

Meta: Llama 4 MaverickOSS

Intelligence Index

Intelligence Index

Intelligence Index

OpenAI: GPT-4 Turbo (older v1106)

Intelligence Index

Claude 3.5 Sonnet

Intelligence Index

Qwen: QwQ 32BOSS

Intelligence Index

Intelligence Index

Intelligence Index

Intelligence Index

OpenAI: GPT-4o Audio

Intelligence Index

OpenAI: GPT-4o-mini Search Preview

Intelligence Index

OpenAI: GPT-4o Search Preview

Intelligence Index

OpenAI: GPT-4 Turbo

Intelligence Index

Gemini 2.0 Flash 001

Intelligence Index

Google: Gemini 2.0 Flash Lite

Intelligence Index

Gemini 2.0 Flash

Intelligence Index

Gemini 2.0 Flash Lite

Intelligence Index

Meta: Llama 4 ScoutOSS

Intelligence Index

Meta Llama 3.1 405B InstructOSS

Intelligence Index

Nous: Hermes 3 405B Instruct (free)OSS

Intelligence Index

Meta: Llama 3.1 405B (base)OSS

Intelligence Index

Data Sources

Artificial Analysis ↗

Intelligence, coding, math indices, MMLU-PRO, GPQA, speed metrics

Aider Leaderboard ↗

Code editing and polyglot pass rates, real-world coding costs

LMSYS Chatbot Arena ↗

ELO ratings from blind human preference votes across categories

Open LLM Leaderboard ↗

IFEval, BBH, MATH Lvl 5, GPQA, MUSR, MMLU-PRO for open models

About LLM Benchmarks

LLM benchmarks measure the capabilities of large language models across key dimensions. Our leaderboard aggregates data from >5 sources to provide the most comprehensive view of model performance available. Intelligence benchmarks like MMLU-PRO and GPQA test knowledge and reasoning. Coding benchmarks from Aider and LiveCodeBench measure practical programming ability. Math benchmarks including MATH-500 and AIME test mathematical reasoning. Arena ELO ratings reflect real human preferences in blind comparisons.

Speed metrics show real-world API performance: output tokens per second measures generation throughput, while time-to-first-token (TTFT) measures initial response latency — critical for interactive applications.