The Llama journey
On April 5th, Meta announced the release of Llama 4, stating that it is beginning a new era of natively multimodal AI innovation. Before we get into Llama 4, let’s quickly go back to the start of the Llama family journey.
Llama, short for Large Language Model Meta AI, began in early 2023 when Meta set out a vision to make research-grade language models open-source. Llama 1 offered great performance for smaller model sizes. Building on that foundation, Meta then went on to think bigger and better, quite literally, with the announcement of larger 7B, 13B and 70B versions with Llama 2. They then went a step further with Llama 3, incorporating better tokenizer designs and using significantly more training data.
Now we have Llama 4, which is Meta’s most advanced evolution yet. Consisting of 3 models, this new herd of Llama 4 models outperforms earlier models across a range of tasks:
- Reasoning: Improve comprehension and complex problem-solving
- Code generation: Advanced generation and debugging capabilities
- Multimodal tasks: Understanding and reasoning over both texts and images
Llama 4 Scout
The first of the herd is Llama 4 Scout, a 17 billion active parameter model with 16 experts. Whilst fitting in a single NVIDIA H100 GPU, Llama 4 Scout is considered the best multimodal model in the world. Offering an industry-leading context window of 10M, delivering better results than Gemma 3, Gemini 2.0, and Mistral 3.1.
Llama 4 Maverick
The second addition to the family is Llama 4 Maverick, a 17 billion active parameter model with 128 experts currently beating GPT-4o and Gemini 2.0 whilst achieving comparable results to the new DeepSeek v3 on reasoning and coding - at less than half the active parameters.
Llama 4 Behemoth
However, according to Meta. these models wouldn’t be as great if not for the distillation from Llama 4, a 288 billion active parameter model with 16 experts - one of Meta’s most powerful yet and among the world’s smartest LLMs. Shown to outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training with Meta sharing more information soon.
Designed for enterprises, researchers and developers, the Llama 4 family is helping to build AI solutions across various industries.
Get started with Llama 4 on Nscale Serverless Inference
4 simple steps to get started with Nscale’s Serverless Inference service:
- Sign up here
- Claim your free credit and create your API key
- Browse the available models
- Begin making inference requests via the OpenAI-compatible API
The service launches with detailed documentation https://docs.nscale.com covering everything from basic setup to advanced usage patterns. This includes implementation guides, API references, and code examples. Our team of experts is available to answer any questions about the service and help with onboarding. If you have any questions, please contact the Nscale team at help@nscale.com.