Meta Models — Pricing, Benchmarks & Capabilities

Metric	Input	Output
Cheapest	$0.02	$0.02
Average	$0.47	$0.61
Most Expensive	$4.00	$4.00

MetaLlama 3.1OSS

#133

Meta: Llama 3.1 405B (base)

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Context

33K

Max Output

33K

Input/1M

$0.40

Pricing (per 1M tokens)

OpenRouter$4.00 / $4.00

Vercel AI$0.40 / $0.40

Martian$4.00 / $4.00

2024-08-02View details →

MetaLlama 4OSS

#142

Meta: Llama 4 Maverick

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

Context

1.0M

Max Output

16K

Input/1M

$0.15

👁 Vision🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.15 / $0.60

Vercel AI$0.15 / $0.60

Martian$0.15 / $0.60

2025-04-05View details →

MetaLlama 4OSS

#149

Meta: Llama 4 Scout

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Context

328K

Max Output

16K

Input/1M

$0.08

👁 Vision🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.08 / $0.30

Vercel AI$0.08 / $0.30

Martian$0.08 / $0.30

2025-04-05View details →

MetaLlama 3.3OSS

#154

Meta: Llama 3.3 70B Instruct (free)

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md)

Context

128K

Max Output

128K

Input/1M

Free

🔧 Tools

Pricing (per 1M tokens)

OpenRouterFree / Free

Vercel AI$0.72 / $0.72

Martian$0.10 / $0.32

2024-12-06View details →

MetaLlama 3.1OSS

#178

NVIDIA: Llama 3.1 Nemotron 70B Instruct

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Context

131K

Max Output

16K

Input/1M

$1.20

🔧 Tools

Pricing (per 1M tokens)

OpenRouter$1.20 / $1.20

Martian$1.20 / $1.20

DeepInfra$1.20 / $1.20

2024-10-15View details →

MetaLlama 3.1OSS

#180

Meta: Llama 3.1 70B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Context

131K

Max Output

—

Input/1M

$0.40

🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.40 / $0.40

Martian$0.40 / $0.40

DeepInfra$0.40 / $0.40

2024-07-23View details →

MetaLlama 3OSS

#195

Meta: Llama 3 70B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Context

8K

Max Output

8K

Input/1M

$0.51

Pricing (per 1M tokens)

OpenRouter$0.51 / $0.74

Martian$0.51 / $0.74

2024-04-18View details →

MetaLlama 3OSS

#225

Meta: Llama 3 8B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Context

8K

Max Output

16K

Input/1M

$0.03

🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.03 / $0.04

Martian$0.03 / $0.04

DeepInfra$0.03 / $0.04

2024-04-18View details →

MetaLlama 3.1OSS

#232

Meta: Llama 3.1 8B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Context

16K

Max Output

16K

Input/1M

$0.02

🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.02 / $0.05

Vercel AI$0.03 / $0.05

Martian$0.02 / $0.05

DeepInfra$0.02 / $0.05

2024-07-23View details →

MetaLlama 3OSS

#258

Meta: Llama 3.2 3B Instruct (free)

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Context

131K

Max Output

—

Input/1M

Free

Pricing (per 1M tokens)

OpenRouterFree / Free

Vercel AI$0.15 / $0.15

Martian$0.02 / $0.02

DeepInfra$0.02 / $0.02

2024-09-25View details →

MetaLlama 3OSS

#287

Meta: Llama 3.2 1B Instruct

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance. Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Context

60K

Max Output

—

Input/1M

$0.03

Pricing (per 1M tokens)

OpenRouter$0.03 / $0.20

Vercel AI$0.10 / $0.10

Martian$0.03 / $0.20

2024-09-25View details →

MetaLlama 3.3OSS

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.

Context

131K

Max Output

—

Input/1M

$0.10

🧠 Reasoning🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.10 / $0.40

Martian$0.10 / $0.40

DeepInfra$0.10 / $0.40

2025-10-10View details →

MetaOSS

Meta: Llama Guard 4 12B

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM—generating text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 4 was aligned to safeguard against the standardized MLCommons hazards taxonomy and designed to support multimodal Llama 4 capabilities. Specifically, it combines features from previous Llama Guard models, providing content moderation for English and multiple supported languages, along with enhanced capabilities to handle mixed text-and-image prompts, including multiple images. Additionally, Llama Guard 4 is integrated into the Llama Moderations API, extending robust safety classification to text and images.

Context

164K

Max Output

—

Input/1M

$0.18

👁 Vision

Pricing (per 1M tokens)

OpenRouter$0.18 / $0.18

Martian$0.18 / $0.18

DeepInfra$0.18 / $0.18

2025-04-30View details →

MetaOSS

AlfredPros: CodeLLaMa 7B Instruct Solidity

A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finetuning provided by PEFT library.

Context

4K

Max Output

4K

Input/1M

$0.80

Pricing (per 1M tokens)

OpenRouter$0.80 / $1.20

2025-04-14View details →

MetaLlama 3.1OSS

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

Context

131K

Max Output

—

Input/1M

$0.60

🧠 Reasoning

Pricing (per 1M tokens)

OpenRouter$0.60 / $1.80

Martian$0.60 / $1.80

2025-04-08View details →

MetaOSS

Llama Guard 3 8B

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls.

Context

131K

Max Output

—

Input/1M

$0.02

Pricing (per 1M tokens)

OpenRouter$0.02 / $0.06

Martian$0.02 / $0.06

2025-02-12View details →

MetaLlama 3.1OSS

AionLabs: Aion-RP 1.0 (8B)

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing.

Context

33K

Max Output

33K

Input/1M

$0.80

Pricing (per 1M tokens)

OpenRouter$0.80 / $1.60

Martian$0.80 / $1.60

2025-02-04View details →

MetaDeepSeek R1OSS

DeepSeek: R1 Distill Llama 70B

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Context

131K

Max Output

131K

Input/1M

$0.03

🧠 Reasoning⚡ Cache

Pricing (per 1M tokens)

OpenRouter$0.03 / $0.11

DeepInfra$0.70 / $0.80

2025-01-23View details →

Metao1

Sao10K: Llama 3.1 70B Hanami x1

This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).

Context

16K

Max Output

—

Input/1M

$3.00

Pricing (per 1M tokens)

OpenRouter$3.00 / $3.00

Martian$3.00 / $3.00

2025-01-08View details →

Metao1

Sao10K: Llama 3.3 Euryale 70B

Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).

Context

131K

Max Output

16K

Input/1M

$0.65

Pricing (per 1M tokens)

OpenRouter$0.65 / $0.75

Martian$0.65 / $0.75

2024-12-18View details →

MetaLlama 3OSS

Meta: Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Context

131K

Max Output

16K

Input/1M

$0.05

👁 Vision

Pricing (per 1M tokens)

OpenRouter$0.05 / $0.05

Martian$0.05 / $0.05

DeepInfra$0.05 / $0.05

2024-09-25View details →

MetaLlama 3OSS

Llama 3.2 11B Vision Instruct

Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.

Context

128K

Max Output

8K

Input/1M

$0.16

🔧 Tools

Pricing (per 1M tokens)

Vercel AI$0.16 / $0.16

2024-09-25View details →

MetaLlama 3OSS

Llama 3.2 90B Vision Instruct

Instruction-tuned image reasoning generative model (text + images in / text out) optimized for visual recognition, image reasoning, captioning and answering general questions about the image.

Context

128K

Max Output

8K

Input/1M

$0.72

🔧 Tools

Pricing (per 1M tokens)

Vercel AI$0.72 / $0.72

2024-09-25View details →

MetaLlama 3.1OSS

NeverSleep: Lumimaid v0.2 8B

Lumimaid v0.2 8B is a finetune of [Llama 3.1 8B](/models/meta-llama/llama-3.1-8b-instruct) with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged. Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Context

33K

Max Output

4K

Input/1M

$0.09

Pricing (per 1M tokens)

OpenRouter$0.09 / $0.60

Martian$0.09 / $0.60

2024-09-15View details →

Metao1

Sao10K: Llama 3.1 Euryale 70B v2.2

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b).

Context

33K

Max Output

33K

Input/1M

$0.65

🔧 Tools

Pricing (per 1M tokens)

OpenRouter$0.65 / $0.75

Martian$0.65 / $0.75

2024-08-28View details →

MetaLlama 3.1OSS

Nous: Hermes 3 70B Instruct

Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 70B is a competitive, if not superior finetune of the [Llama-3.1 70B foundation model](/models/meta-llama/llama-3.1-70b-instruct), focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Context

66K

Max Output

66K

Input/1M

$0.30

Pricing (per 1M tokens)

OpenRouter$0.30 / $0.30

DeepInfra$0.30 / $0.30

2024-08-18View details →

MetaLlama 3.1OSS

Nous: Hermes 3 405B Instruct (free)

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.

Context

131K

Max Output

—

Input/1M

Free

Pricing (per 1M tokens)

OpenRouterFree / Free

DeepInfra$1.00 / $1.00

2024-08-16View details →

Metao1

Sao10K: Llama 3 8B Lunaris

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge. Created by [Sao10k](https://huggingface.co/Sao10k), this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning. For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1.

Context

8K

Max Output

—

Input/1M

$0.04

Pricing (per 1M tokens)

OpenRouter$0.04 / $0.05

Martian$0.04 / $0.05

2024-08-13View details →

Metao1

Sao10k: Llama 3 Euryale 70B v2.1

Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom formatting / reply formats. - Very creative, lots of unique swipes. - Is not restrictive during roleplays.

Context

8K

Max Output

8K

Input/1M

$1.48

🔧 Tools

Pricing (per 1M tokens)

OpenRouter$1.48 / $1.48

Martian$1.48 / $1.48

2024-06-18View details →

MetaLlama 3OSS

NousResearch: Hermes 2 Pro - Llama-3 8B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Context

8K

Max Output

8K

Input/1M

$0.14

Pricing (per 1M tokens)

OpenRouter$0.14 / $0.14

Martian$0.14 / $0.14

2024-05-27View details →

MetaOSS

Meta: LlamaGuard 2 8B

This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification. LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated. For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Context

8K

Max Output

—

Input/1M

$0.20

Pricing (per 1M tokens)

OpenRouter$0.20 / $0.20

2024-05-13View details →

MetaLlama 3.3OSS

Llama 3.3 70B Instruct

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

128K

Max Output

—

Input/1M

$0.13

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.13 / $0.40

View details →

MetaDeepSeek R1OSS

Deepseek R1 Distill Llama 70b

DeepSeek R1 Distill LLama 70B

Context

32K

Max Output

—

Input/1M

$0.80

Pricing (per 1M tokens)

Requesty★$0.80 / $0.80

View details →

MetaLlama 3OSS

Llama 3.2 3b Instruct

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out)

Context

33K

Max Output

—

Input/1M

$0.03

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.03 / $0.05

View details →

MetaLlama 3OSS

Llama 3.2 1b Instruct

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out).

Context

131K

Max Output

—

Input/1M

$0.02

Pricing (per 1M tokens)

Requesty★$0.02 / $0.02

View details →

MetaLlama 3OSS

Llama 3 70b Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.

Context

8K

Max Output

—

Input/1M

$0.51

Pricing (per 1M tokens)

Requesty★$0.51 / $0.74

View details →

MetaLlama 3OSS

Hermes 2 Pro Llama 3 8b

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Context

8K

Max Output

—

Input/1M

$0.14

Pricing (per 1M tokens)

Requesty★$0.14 / $0.14

View details →

MetaLlama 4OSS

Llama 4 Maverick 17b 128e Instruct Fp8

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

1.0M

Max Output

1.0M

Input/1M

$0.20

Pricing (per 1M tokens)

Requesty★$0.20 / $0.85

View details →

MetaLlama 3.1OSS

Llama 3.1 8b Instruct

Meta's latest class of models, Llama 3.1, launched with a variety of sizes and configurations. The 8B instruct-tuned version is particularly fast and efficient. It has demonstrated strong performance in human evaluations, outperforming several leading closed-source models.

Context

16K

Max Output

—

Input/1M

$0.05

Pricing (per 1M tokens)

Requesty★$0.05 / $0.05

View details →

MetaLlama 3OSS

Llama 3 8b Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations.

Context

8K

Max Output

—

Input/1M

$0.04

Pricing (per 1M tokens)

Requesty★$0.04 / $0.04

View details →

MetaLlama 3.3OSS

Llama 3.3 70B Instruct Turbo

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

131K

Max Output

—

Input/1M

$0.88

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.88 / $0.88

View details →

MetaLlama 3.1OSS

Meta Llama 3.1 8B Instruct Turbo

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

131K

Max Output

—

Input/1M

$0.18

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.18 / $0.18

View details →

MetaLlama 3.1OSS

Meta Llama 3.1 70B Instruct Turbo

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

131K

Max Output

—

Input/1M

$0.88

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.88 / $0.88

View details →

MetaOSS

LlamaGuard 2 8b

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

8K

Max Output

—

Input/1M

$0.20

Pricing (per 1M tokens)

Requesty★$0.20 / $0.20

View details →

MetaLlama 3OSS

Meta Llama 3 8B Instruct Lite

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

8K

Max Output

—

Input/1M

$0.10

Pricing (per 1M tokens)

Requesty★$0.10 / $0.10

View details →

MetaLlama 3OSS

Llama 3.2 3B Instruct Turbo

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

131K

Max Output

—

Input/1M

$0.06

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.06 / $0.06

View details →

MetaDeepSeek R1OSS

DeepSeek R1 Distill Llama 70B

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

Context

64K

Max Output

8K

Input/1M

$0.23

Pricing (per 1M tokens)

Requesty★$0.23 / $0.69

View details →

MetaLlama 3.1OSS

Meta Llama 3.1 405B Instruct

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

131K

Max Output

—

Input/1M

$0.80

Pricing (per 1M tokens)

Requesty★$0.80 / $0.80

View details →

MetaLlama 3OSS

Llama 3.2 90B Vision Instruct

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

131K

Max Output

4K

Input/1M

$0.35

Pricing (per 1M tokens)

Requesty★$0.35 / $0.40

View details →

MetaLlama 3.1OSS

Meta Llama 3.1 70B Instruct

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Context

131K

Max Output

—

Input/1M

$0.23

🔧 Tools

Pricing (per 1M tokens)

Requesty★$0.23 / $0.40

View details →

Meta

$ Pricing Summary(per 1M tokens)

⚙ Capabilities

🤖 All Meta Models(50)

Meta: Llama 3.1 405B (base)

Meta: Llama 4 Maverick

Meta: Llama 4 Scout

Meta: Llama 3.3 70B Instruct (free)

NVIDIA: Llama 3.1 Nemotron 70B Instruct

Meta: Llama 3.1 70B Instruct

Meta: Llama 3 70B Instruct

Meta: Llama 3 8B Instruct

Meta: Llama 3.1 8B Instruct

Meta: Llama 3.2 3B Instruct (free)

Meta: Llama 3.2 1B Instruct

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Meta: Llama Guard 4 12B

AlfredPros: CodeLLaMa 7B Instruct Solidity

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

Llama Guard 3 8B

AionLabs: Aion-RP 1.0 (8B)

DeepSeek: R1 Distill Llama 70B

Sao10K: Llama 3.1 70B Hanami x1

Sao10K: Llama 3.3 Euryale 70B

Meta: Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision Instruct

Llama 3.2 90B Vision Instruct

NeverSleep: Lumimaid v0.2 8B

Sao10K: Llama 3.1 Euryale 70B v2.2

Nous: Hermes 3 70B Instruct

Nous: Hermes 3 405B Instruct (free)

Sao10K: Llama 3 8B Lunaris

Sao10k: Llama 3 Euryale 70B v2.1

NousResearch: Hermes 2 Pro - Llama-3 8B

Meta: LlamaGuard 2 8B

Llama 3.3 70B Instruct

Deepseek R1 Distill Llama 70b

Llama 3.2 3b Instruct

Llama 3.2 1b Instruct

Llama 3 70b Instruct

Hermes 2 Pro Llama 3 8b

Llama 4 Maverick 17b 128e Instruct Fp8

Llama 3.1 8b Instruct

Llama 3 8b Instruct

Llama 3.3 70B Instruct Turbo

Meta Llama 3.1 8B Instruct Turbo

Meta Llama 3.1 70B Instruct Turbo

LlamaGuard 2 8b

Meta Llama 3 8B Instruct Lite

Llama 3.2 3B Instruct Turbo

DeepSeek R1 Distill Llama 70B

Meta Llama 3.1 405B Instruct

Llama 3.2 90B Vision Instruct

Meta Llama 3.1 70B Instruct

Meta

$ Pricing Summary(per 1M tokens)

⚙ Capabilities

🤖 All Meta Models(50)

Meta: Llama 3.1 405B (base)

Meta: Llama 4 Maverick

Meta: Llama 4 Scout

Meta: Llama 3.3 70B Instruct (free)

NVIDIA: Llama 3.1 Nemotron 70B Instruct

Meta: Llama 3.1 70B Instruct

Meta: Llama 3 70B Instruct

Meta: Llama 3 8B Instruct

Meta: Llama 3.1 8B Instruct

Meta: Llama 3.2 3B Instruct (free)

Meta: Llama 3.2 1B Instruct

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Meta: Llama Guard 4 12B

AlfredPros: CodeLLaMa 7B Instruct Solidity

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

Llama Guard 3 8B

AionLabs: Aion-RP 1.0 (8B)

DeepSeek: R1 Distill Llama 70B

Sao10K: Llama 3.1 70B Hanami x1

Sao10K: Llama 3.3 Euryale 70B

Meta: Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision Instruct