You asked. We delivered. Llama 4 is now available on Nscale Serverless

Nisha Arya

April 14, 2025

•

2 minutes

Nisha Arya

April 14, 2025

•

2 minutes

The Llama journey

On April 5th, Meta announced the release of Llama 4, stating that it is beginning a new era of natively multimodal AI innovation. Before we get into Llama 4, let’s quickly go back to the start of the Llama family journey.

Llama, short for Large Language Model Meta AI, began in early 2023 when Meta set out a vision to make research-grade language models open-source. Llama 1 offered great performance for smaller model sizes. Building on that foundation, Meta then went on to think bigger and better, quite literally, with the announcement of larger 7B, 13B and 70B versions with Llama 2. They then went a step further with Llama 3, incorporating better tokenizer designs and using significantly more training data.

Now we have Llama 4, which is Meta’s most advanced evolution yet. Consisting of 3 models, this new herd of Llama 4 models outperforms earlier models across a range of tasks:

Reasoning: Improve comprehension and complex problem-solving
Code generation: Advanced generation and debugging capabilities
Multimodal tasks: Understanding and reasoning over both texts and images

‍

Llama 4 Scout

The first of the herd is Llama 4 Scout, a 17 billion active parameter model with 16 experts. Whilst fitting in a single NVIDIA H100 GPU, Llama 4 Scout is considered the best multimodal model in the world. Offering an industry-leading context window of 10M, delivering better results than Gemma 3, Gemini 2.0, and Mistral 3.1.

‍

Llama 4 Maverick

The second addition to the family is Llama 4 Maverick, a 17 billion active parameter model with 128 experts currently beating GPT-4o and Gemini 2.0 whilst achieving comparable results to the new DeepSeek v3 on reasoning and coding - at less than half the active parameters.

‍

Llama 4 Behemoth

However, according to Meta. these models wouldn’t be as great if not for the distillation from Llama 4, a 288 billion active parameter model with 16 experts - one of Meta’s most powerful yet and among the world’s smartest LLMs. Shown to outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training with Meta sharing more information soon.

Designed for enterprises, researchers and developers, the Llama 4 family is helping to build AI solutions across various industries.

‍

Get started with Llama 4 on Nscale Serverless Inference

4 simple steps to get started with Nscale’s Serverless Inference service:

Sign up here
Claim your free credit and create your API key
Browse the available models
Begin making inference requests via the OpenAI-compatible API

The service launches with detailed documentation https://docs.nscale.com covering everything from basic setup to advanced usage patterns. This includes implementation guides, API references, and code examples. Our team of experts is available to answer any questions about the service and help with onboarding. If you have any questions, please contact the Nscale team at help@nscale.com.

‍

Nisha Arya

Head of Content

Bio

Data scientist turned marketer with 6+ years of experience in the AI and machine learning industry.