Last week, we announced the launch of Nscale serverless inference, a platform that provides instant access to popular Generative AI models with zero set-up, instant deployment, and best of all - you only pay for what you use.
This new public cloud service allows developers and enterprises to quickly deploy leading AI models at scale without managing the underlying infrastructure, complementing Nscale’s established private cloud solutions tailored for large-scale enterprise AI workloads.
Models available
Nscale’s serverless inference platform has been curated with a selection of popular Generative AI models, enabling a wide array of use cases. From natural language processing to multimodal applications, with the Nscale serverless inference platform, you can select your pre-trained models or simply upload and deploy your own custom model.
Here is a list of our models at launch, which we’re always expanding as new state-of-the-art models and modalities emerge.
Pay-as-you-go pricing
Scalability and cost-efficiency are critical components for developers and organisations. This is why Nscale’s serverless inference pricing has been specifically designed to be transparent and align with your usage. Our pay-as-you-go pricing model allows you to visualise spending in real time and scale with your workload, eliminating surprise bills and the headaches of managing infrastructure and unused resources.
Production inference before standup ends
With a choice of SDK, CLI, and API calls, there’s never been a better time to integrate generative AI into your product with Nscale. Let’s explore an example use case - text summarisation - by processing an entire novel for less than three cents.
Our inference API is compatible with OpenAI and their SDK, making trying our models a breeze by simply updating your api_key
and base_url
.
import os
import requests
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("NSCALE_API_KEY"),
base_url="https://inference.api.nscale.com/v1",
)
book_text = requests.get("https://www.gutenberg.org/files/84/84-0.txt").text
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
messages=[
{"role": "system", "content": "You are a literary analysis expert."},
{"role": "user", "content": "Summarise this book in under 100 words\n\n" + book_text}
],
temperature=0.7
)
print(response.choices[0].message.content)
The same can be achieved using the Nscale CLI:
nscale chat -m deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--message "system:You are a literary analysis expert" \
--message "user:Summarise this book in under 100 words\n\n$(curl -sL https://www.gutenberg.org/files/84/84-0.txt)"
Or with a curl command on your terminal:
curl -s "https://inference.api.nscale.com/v1/chat/completions" \
-H "Authorization: Bearer $NSCALE_API_KEY" \
-H "Content-Type: application/json" \
-d "$(curl -sL https://www.gutenberg.org/files/84/84-0.txt | jq -sR '{
model: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
messages: [
{role: "system", content: "You are a literary analysis expert"},
{role: "user", content: ("Summarise this book in under 100 words\n\n" + .)}
],
temperature: 0.7
}')" | jq .
Output:
"Frankenstein" by Mary Shelley is a complex and layered novel that explores themes of ambition, isolation, revenge, and the consequences of playing god. Here's a breakdown of the key elements:
1. **Consequences of Ambition**: Victor Frankenstein's unchecked ambition leads him to create a monster, highlighting the dangers of hubris and the importance of considering moral implications.
2. **Isolation**: Both Frankenstein and the monster experience deep isolation. Frankenstein's self-imposed isolation drives his downfall, while the monster's isolation leads to his violent acts.
3. **Revenge and Justice**: The story delves into the themes of revenge, with the monster seeking justice for his abandonment and Frankenstein pursuing revenge against his creation.
4. **Nature vs. Nurture**: The novel questions what makes a monster, suggesting that circumstances and abandonment play a significant role.
usage: { completion_tokens=1199, prompt_tokens=98357, total_tokens=99556 }
That’s a total of 75060 words summarised at a cost of only $0.029! Our smaller models can be used to lower the price even further.
Get started with Nscale Serverless Inference
4 simple steps to get started with Nscale’s Serverless Inference service:
- Sign up here
- Claim your free credit and create your API key
- Browse the available models
- Begin making inference requests via the OpenAI-compatible API
The service launches with detailed documentation https://docs.nscale.com covering everything from basic setup to advanced usage patterns. This includes implementation guides, API references, and code examples. Our team of experts is available to answer any questions about the service and help with onboarding. If you have any questions, please contact the Nscale team at help@nscale.com.
What’s next?
Nscale will continue to expand its product offering to support new models, different model modalities, dedicated endpoints for these models, and the ability to fine-tune and deploy custom models. All of which will be built on our vertically integrated technology stack for the fastest and most cost effective solution to developing, training, fine-tuning, and deploying AI models at all scales.