LLM Router
HomeRoutersModelsProvidersBenchmarksPricingCompareBlogAbout
HomeModelsBenchmarksPricingCompareBlog
LLM Router

Independent comparison platform for LLM routing infrastructure.

Platform

  • Home
  • Routers
  • Models
  • Pricing
  • Blog
  • About

Routers

  • Requesty
  • OpenRouter
  • Martian
  • Unify
  • LiteLLM

© 2026 LLM Router

Data from public sources. May not reflect real-time pricing.

MetaLlama 3.3Open SourceOct 10, 2025

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.

Context Window
131K
tokens
Max Output
—
tokens
Released
Oct 10, 2025
Arena Rank
—

Capabilities

👁Vision
🧠Reasoning
🔧Tool Calling
⚡Prompt Caching
🖥Computer Use
🎨Image Generation

Supported Parameters

Frequency Penalty
Reduce repetition
Include Reasoning
Show reasoning tokens
Max Tokens
Output length limit
min_p
Presence Penalty
Encourage new topics
Reasoning
Extended thinking
Repetition Penalty
Penalize repeated tokens
Response Format
JSON mode / structured output
Seed
Deterministic outputs
Stop Sequences
Custom stop tokens
Temperature
Controls randomness
Tool Choice
Control tool usage
Tool Calling
Function calling support
Top K
Top-K sampling
Top P
Nucleus sampling

Pricing Comparison

RouterInput / 1MOutput / 1MCached Input / 1M
OpenRouter$0.10$0.40—
Martian$0.10$0.40—
DeepInfra$0.10$0.40—

Model IDs

OpenRouternvidia/llama-3.3-nemotron-super-49b-v1.5
Hugging Facenvidia/Llama-3_3-Nemotron-Super-49B-v1_5 ↗

Tags

reasoningtool-calling
Compare with another model

Compare with…

Llama 3.3 70B InstructMeta Llama 3 8B Instruct LiteNeverSleep: Lumimaid v0.2 8B

Similar Models

Ranked by provider, pricing, capabilities, and arena performance

Meta
75% match

Llama 3.3 70B Instruct

128K ctx$0.13/1M in

Same family · Similar price

Meta
67% match

Meta Llama 3 8B Instruct Lite

8K ctx$0.10/1M in

Same provider · Similar price

Meta
64% match

NeverSleep: Lumimaid v0.2 8B

33K ctx$0.09/1M in

Same provider · Similar price

Meta
61% match

Llama 3.2 11B Vision Instruct

128K ctx$0.16/1M in

Same provider · Similar price

Meta
60% match

Meta Llama 3.1 8B Instruct Turbo

131K ctx$0.18/1M in

Same provider · Similar price

Meta
60% match

Hermes 2 Pro Llama 3 8b

8K ctx$0.14/1M in

Same provider · Similar price

← Back to all models