Product

Nscale is now an official Hugging Face Inference provider

We are excited to announce that Nscale is now listed as an official Inference Provider on Hugging Face.

As AI demands more and more compute, teams face the increasing challenges of training, deploying, and scaling AI inference efficiently. 

At Nscale, we have specifically engineered our infrastructure to meet modern AI demands. We’re making it even easier to access by integrating directly with Hugging Face so developers and enterprises can run their favourite models on high-performant GPU clusters. 

If you’re looking to use the latest models such as LLaMA 4 or Qwen 3, Nscale and Hugging Face give you the performance, scalability and flexibility to go from prototype to production - before your stand-up has even ended. 

What does this mean?

Through the Hugging Face Inference Provider directory, users can route their models to run on Nscale’s infrastructure, unlocking:

  • Best-in-class GPUs, including NVIDIA H100s and AMD MI300X
  • Low-cost compute with pay-per-token pricing
  • Lightning-fast inference with low-latency, high-throughput performance 

Why Nscale and Hugging Face?

Model deployment at ease

With Nscale’s Hugging Face integration, you can accelerate your AI workloads in a few simple steps. From running Transformers to hosting inference endpoints, we provide access to high-performance GPUs, such as NVIDIA H100 and AMD MI300X. 

What can you do?

  • Deploy large models such as Falcon, Mistral and Mixtral
  • Auto-scale your endpoints for peak traffic
  • Run latency-sensitive inference workload on dedicated GPU clusters

Scale with support

Support is an important element of your process when scaling your AI initiatives. Nscale supports distributed inference out of the box:

  • Multi-node, multi-GPU clusters.
  • Preconfigured environments to support Hugging Face models.
  • Optimised for inference (PyTorch, vLLM, SGLang, etc).

Our clusters are hosted in Northern Europe near the Arctic Circle, powered by 100% renewable energy, allowing you to run your inference on power and sustainability. 

Get started in minutes

To get started, simply head to the Hugging Face Inference Providers directory and select Nscale as your compute backend. You’ll be able to deploy your models on high-performance infrastructure in just a few clicks - no complex setup required.

The hyperscaler engineered for AI. Now powering Hugging Face models.

Nisha Arya
Head of Content
Bio

Data scientist turned marketer with 6+ years of experience in the AI and machine learning industry.

Explore More