Nscale's Svartisen Cluster has made the 24th official Top500 list!
Established in 1993, Top 500 Supercomputers has bi-annually ranked the 500 most powerful super-computing systems worldwide. The ranking is based on the High-Performance Linpack (HPL) benchmark, which measures a system's ability to solve linear equations. Although the application Linpack has been around for a long time, its relevance is key in HPC and AI due to its heavy utilisation of matrix multiplication.
Nscale’s Glomfjord DC
At Nscale, we firmly believe supercomputing methods and practices are critical for building infrastructure that performs at scale. HPC is at the heart of everything we do for customers running AI workloads at scale. This results demonstrates the teams ability to build and tune infrastructure that is designed to run single large scale applications efficiently.
Delivering exceptional infrastructure is more than just deploying GPUs. Behind the scenes, it consists of tuning data centres, optimising cooling systems, configuring hardware, calibrating GPUs, refining network fabric and perfecting the software stack.
At the heart of Nscale’s recent Top 500 achievement is an optimised GPU cluster, based on AMD’s MI250 accelerators. Each node consists of 4x AMD Instinct MI250X OAMs (8x GPUs), as well as 2x AMD EPYC 7713 CPUs. The nodes are interconnected with a finely tuned Ethernet fabric, which ensures we have a high-performance lossless RDMA (RoCE) network, powered by Broadcom and built on open standards. The network in particular plays a key role in scaling collectives efficiently - a foundational requirement for AI workloads and HPC tasks.
Every component in the stack is optimised for peak performance, showcasing Nscale’s capability as a service provider to deliver high-performance infrastructure for the most demanding applications, across both the AI and HPC space.
Our Findings
For the Top500 submission, we ran HPL in FP64 mode, however, we also ran mixed precision (Linpack MXP), which gives a more reflective view of AI applications’ performance.
Nscale’s cloud software stack provisions and configures the cluster with all of the system and fabric optimisations built in – delivering supercomputing performance effortlessly. Whether you require SLURM, Kubernetes or a combination of both, Nscale’s automation enables users to spin up clusters in minutes.
We are currently standing up our AMD MI300X platform and expect to be back again for the next Top500 list, at a much higher place!