Mockup for reviewTech-stack demonstration. Not affiliated with Nebius and not the live Builders Network.About this build →
AI Cloud

Build on GPUs your way.

VMs, Managed Kubernetes, and Slurm for AI training and inference on high-performance GPU infrastructure. Launch your first workload in minutes.

Resources & guides

Deep dives and reference architectures.

Tutorials, best practices, and production patterns for GPU training, inference, and orchestration on Nebius AI Cloud.
Guide

Bare-metal-class performance for AI inference in MLPerf Inference v5.1

Nebius proves near-bare-metal inference performance across three AI systems in MLPerf Inference v5.1, showing how its virtualized GPU stack serves production LLM inference without a meaningful performance tax.Read on Nebius
Guide

Clusters vs single nodes: which to use for training and inference

When to fine-tune on one machine versus scaling across many nodes — how PyTorch DDP, DeepSpeed and Horovod handle data partitioning and gradient sync, with Kubernetes/Slurm scheduling, illustrated on Nebius GPU infrastructure.Read on Nebius
Guide

Data preparation techniques

Best practices and practical pipelines for preparing large datasets for LLM training and distributed AI workloads on GPU clusters.Read on Nebius
Guide

Fault-tolerant training: how we build reliable clusters for distributed AI

Nebius's multi-layered approach to reliable large-scale training — liveness probes, automatic checkpoint-restart on hardware failure, graceful node termination, and the MTBF/MTTR metrics behind a dependable GPU cluster.Read on Nebius
Guide

Introducing Managed Soperator: your quick access to Slurm training

Nebius's fully managed Slurm-on-Kubernetes solution is now self-service: spin up a training-ready GPU cluster with pre-installed drivers and libraries in minutes. Explains the Soperator operator, shared root filesystem, and automatic node recovery.Read on Nebius
Guide

MLPerf Training v5.1: leading results on NVIDIA Blackwell and Blackwell Ultra

Nebius posted seven first-place finishes in MLPerf Training v5.1 across NVIDIA Blackwell and Blackwell Ultra systems — a look at the hardware-software co-optimization behind training performance on Nebius AI Cloud.Read on Nebius
Guide

Orchestrating LLM fine-tuning on K8s with SkyPilot and MLflow

Manage distributed fine-tuning workloads with experiment tracking on Kubernetes GPU clusters.Read on Nebius
Guide

Running Boltz-2 inference at scale in Nebius AI Cloud

Scale life-sciences inference workloads on GPU clusters optimized for performance and throughput.Read on Nebius
Guide

Running NVIDIA NIM and NVIDIA Blueprint in Nebius AI Cloud

Deploy validated healthcare AI models using NVIDIA NIM on high-performance GPU infrastructure.Read on Nebius
Guide

The role of compute cluster networking for AI training and inference

Why high-speed interconnects (InfiniBand, NVIDIA Quantum-2) make or break large-scale AI workloads, and how Nebius AI Cloud builds fast, reliable GPU cluster networking for distributed training and inference.Read on Nebius
Guide

Using SkyPilot and Kubernetes for multi-node fine-tuning of Llama 3.1

Step-by-step guide to running distributed LLM fine-tuning on Nebius AI Cloud GPU infrastructure.Read on Nebius
Guide

What are GPU clusters and how to choose yours?

A practical primer on GPU compute clusters for training, fine-tuning and inference — covering hardware, orchestration (Kubernetes vs Slurm), InfiniBand networking and shared storage, and how to pick the right configuration on Nebius AI Cloud.Read on Nebius
GitHub

Reference implementations.

Drop-in recipes and solutions library repos for K8s, Soperator, SkyPilot, vLLM, and more on Nebius AI Cloud.
GitHub

Kvax: fast Flash Attention for JAX with context parallelism

Nebius's open-source Flash Attention 2 implementation for JAX, built on Triton kernels with efficient document-mask computation and context parallelism for FSDP/HSDP-sharded long-sequence training on GPU clusters.View on GitHub
GitHub

ML Cookbook: pre-training DeepSeek-V3 with MXFP8 on a B200 cluster

A Nebius ml-cookbook recipe with Slurm job scripts for multi-node pre-training of DeepSeek-V3 (16B and 671B) on a 256-GPU NVIDIA B200 cluster, showing up to 41% faster throughput with MXFP8 mixed precision and DeepEP.View on GitHub
GitHub

Nebius Kubernetes Applications: Helm charts for GPU & AI workloads

Official Nebius repo of 50+ ready-to-deploy Helm charts and manifests for Managed Kubernetes — including vLLM inference, Ray cluster/serve, NVIDIA GPU Operator, Stable Diffusion WebUI, MLflow and JupyterHub. A practical reference for deploying AI apps on Nebius mk8s.View on GitHub
GitHub

Nebius Solutions Library

Reference architectures and ready-to-deploy recipes for GPU training, inference, K8s, Slurm, and SkyPilot on Nebius AI Cloud.View on GitHub
GitHub

SkyPilot — finetuning & orchestrating

Orchestrate multi-node LLM fine-tuning and distributed training across GPU VMs and Kubernetes clusters using SkyPilot.View on GitHub
GitHub

vLLM examples (Nebius PS services)

Reference vLLM deployment examples in the Nebius Professional Services repo. Drop-in configs for production inference workloads.View on GitHub
GitHub

End-to-end Slurm training + vLLM inference demo on Nebius

A community reference project that provisions a 2-node, 16x H100 Soperator (Slurm-on-Kubernetes) cluster with Terraform, runs SFT and LoRA fine-tuning via sbatch, then serves the model with single- and multi-node vLLM — with reported accuracy gains from 2% to 88%.View on GitHub

Ready to launch a GPU workload?

Open Console ↗
Mockup for reviewStack demo — not the live Builders Network.About this build →
Brand