Nscale is now an official Hugging Face Inference provider

Nisha Arya

May 14, 2025

•

2 minutes

Nisha Arya

May 14, 2025

•

2 minutes

We are excited to announce that Nscale is now listed as an official Inference Provider on Hugging Face.

As AI demands more and more compute, teams face the increasing challenges of training, deploying, and scaling AI inference efficiently.

At Nscale, we have specifically engineered our infrastructure to meet modern AI demands. We’re making it even easier to access by integrating directly with Hugging Face so developers and enterprises can run their favourite models on high-performant GPU clusters.

If you’re looking to use the latest models such as LLaMA 4 or Qwen 3, Nscale and Hugging Face give you the performance, scalability and flexibility to go from prototype to production - before your stand-up has even ended.

‍

What does this mean?

Through the Hugging Face Inference Provider directory, users can route their models to run on Nscale’s infrastructure, unlocking:

Best-in-class GPUs, including NVIDIA H100s and AMD MI300X
Low-cost compute with pay-per-token pricing
Lightning-fast inference with low-latency, high-throughput performance

‍

Why Nscale and Hugging Face?

‍

Model deployment at ease

With Nscale’s Hugging Face integration, you can accelerate your AI workloads in a few simple steps. From running Transformers to hosting inference endpoints, we provide access to high-performance GPUs, such as NVIDIA H100 and AMD MI300X.

What can you do?

Deploy large models such as Falcon, Mistral and Mixtral
Auto-scale your endpoints for peak traffic
Run latency-sensitive inference workload on dedicated GPU clusters

‍

Scale with support

Support is an important element of your process when scaling your AI initiatives. Nscale supports distributed inference out of the box:

Multi-node, multi-GPU clusters.
Preconfigured environments to support Hugging Face models.
Optimised for inference (PyTorch, vLLM, SGLang, etc).

Our clusters are hosted in Northern Europe near the Arctic Circle, powered by 100% renewable energy, allowing you to run your inference on power and sustainability.

‍

Get started in minutes

To get started, simply head to the Hugging Face Inference Providers directory and select Nscale as your compute backend. You’ll be able to deploy your models on high-performance infrastructure in just a few clicks - no complex setup required.

The hyperscaler engineered for AI. Now powering Hugging Face models.

‍

Nisha Arya

Head of Content

Bio

Data scientist turned marketer with 6+ years of experience in the AI and machine learning industry.