AI Cloud

Build on GPUs your way.

VMs, Managed Kubernetes, and Slurm for AI training and inference on high-performance GPU infrastructure. Launch your first workload in minutes.

Open Console Docs ↗

Quickstarts

Get hands-on with AI Cloud.

Workshop

Videos & Workshops

Watch and learn.

Video

How to run Boltz-2 at scale on Kubernetes

Deploy a Kubernetes cluster, configure shared storage, and run reproducible, multi-node Boltz-2 inference for protein folding.Watch on YouTube ↗

Video

Build a RAG Chatbot to Chat with Your Code with LlamaIndex, DeepSeek v3 & Nebius

Step-by-step build of a Retrieval-Augmented Generation chatbot that indexes and queries your codebase using LlamaIndex and the DeepSeek v3 model served on Nebius.Watch on YouTube ↗

Resources & guides

Deep dives and reference architectures.

Tutorials, best practices, and production patterns for GPU training, inference, and orchestration on Nebius AI Cloud.

Guide

Bare-metal-class performance for AI inference in MLPerf Inference v5.1

Nebius proves near-bare-metal inference performance across three AI systems in MLPerf Inference v5.1, showing how its virtualized GPU stack serves production LLM inference without a meaningful performance tax.Read on Nebius ↗

Guide

Clusters vs single nodes: which to use for training and inference

When to fine-tune on one machine versus scaling across many nodes — how PyTorch DDP, DeepSpeed and Horovod handle data partitioning and gradient sync, with Kubernetes/Slurm scheduling, illustrated on Nebius GPU infrastructure.Read on Nebius ↗

Guide

Data preparation techniques

Best practices and practical pipelines for preparing large datasets for LLM training and distributed AI workloads on GPU clusters.Read on Nebius ↗

Guide

Fault-tolerant training: how we build reliable clusters for distributed AI

Nebius's multi-layered approach to reliable large-scale training — liveness probes, automatic checkpoint-restart on hardware failure, graceful node termination, and the MTBF/MTTR metrics behind a dependable GPU cluster.Read on Nebius ↗

Guide

Introducing Managed Soperator: your quick access to Slurm training

Nebius's fully managed Slurm-on-Kubernetes solution is now self-service: spin up a training-ready GPU cluster with pre-installed drivers and libraries in minutes. Explains the Soperator operator, shared root filesystem, and automatic node recovery.Read on Nebius ↗

Guide

MLPerf Training v5.1: leading results on NVIDIA Blackwell and Blackwell Ultra

Nebius posted seven first-place finishes in MLPerf Training v5.1 across NVIDIA Blackwell and Blackwell Ultra systems — a look at the hardware-software co-optimization behind training performance on Nebius AI Cloud.Read on Nebius ↗

Guide

Orchestrating LLM fine-tuning on K8s with SkyPilot and MLflow

Manage distributed fine-tuning workloads with experiment tracking on Kubernetes GPU clusters.Read on Nebius ↗

Guide

Running Boltz-2 inference at scale in Nebius AI Cloud

Scale life-sciences inference workloads on GPU clusters optimized for performance and throughput.Read on Nebius ↗

Guide

Running NVIDIA NIM and NVIDIA Blueprint in Nebius AI Cloud

Deploy validated healthcare AI models using NVIDIA NIM on high-performance GPU infrastructure.Read on Nebius ↗

Guide

The role of compute cluster networking for AI training and inference

Why high-speed interconnects (InfiniBand, NVIDIA Quantum-2) make or break large-scale AI workloads, and how Nebius AI Cloud builds fast, reliable GPU cluster networking for distributed training and inference.Read on Nebius ↗

Guide

Using SkyPilot and Kubernetes for multi-node fine-tuning of Llama 3.1

Step-by-step guide to running distributed LLM fine-tuning on Nebius AI Cloud GPU infrastructure.Read on Nebius ↗

Guide

What are GPU clusters and how to choose yours?

A practical primer on GPU compute clusters for training, fine-tuning and inference — covering hardware, orchestration (Kubernetes vs Slurm), InfiniBand networking and shared storage, and how to pick the right configuration on Nebius AI Cloud.Read on Nebius ↗

GitHub

Reference implementations.

Drop-in recipes and solutions library repos for K8s, Soperator, SkyPilot, vLLM, and more on Nebius AI Cloud.

GitHub

Kvax: fast Flash Attention for JAX with context parallelism

Nebius's open-source Flash Attention 2 implementation for JAX, built on Triton kernels with efficient document-mask computation and context parallelism for FSDP/HSDP-sharded long-sequence training on GPU clusters.View on GitHub ↗

GitHub

ML Cookbook: pre-training DeepSeek-V3 with MXFP8 on a B200 cluster

A Nebius ml-cookbook recipe with Slurm job scripts for multi-node pre-training of DeepSeek-V3 (16B and 671B) on a 256-GPU NVIDIA B200 cluster, showing up to 41% faster throughput with MXFP8 mixed precision and DeepEP.View on GitHub ↗

GitHub

Nebius Kubernetes Applications: Helm charts for GPU & AI workloads

Official Nebius repo of 50+ ready-to-deploy Helm charts and manifests for Managed Kubernetes — including vLLM inference, Ray cluster/serve, NVIDIA GPU Operator, Stable Diffusion WebUI, MLflow and JupyterHub. A practical reference for deploying AI apps on Nebius mk8s.View on GitHub ↗

GitHub

Nebius Solutions Library

Reference architectures and ready-to-deploy recipes for GPU training, inference, K8s, Slurm, and SkyPilot on Nebius AI Cloud.View on GitHub ↗

GitHub

SkyPilot — finetuning & orchestrating

Orchestrate multi-node LLM fine-tuning and distributed training across GPU VMs and Kubernetes clusters using SkyPilot.View on GitHub ↗

GitHub

vLLM examples (Nebius PS services)

Reference vLLM deployment examples in the Nebius Professional Services repo. Drop-in configs for production inference workloads.View on GitHub ↗

GitHub

End-to-end Slurm training + vLLM inference demo on Nebius

A community reference project that provisions a 2-node, 16x H100 Soperator (Slurm-on-Kubernetes) cluster with Terraform, runs SFT and LoRA fine-tuning via sbatch, then serves the model with single- and multi-node vLLM — with reported accuracy gains from 2% to 88%.View on GitHub ↗

Documentation

Docs and reference.

Docs

Managed Kubernetes docs

Reference documentation for Nebius Managed Kubernetes — cluster lifecycle, node groups, autoscaling, and GPU configuration.Read on Nebius docs ↗

Docs

SkyPilot on Nebius docs

Official docs for the SkyPilot integration on Nebius AI Cloud. Provisioning, multi-node training, and autoscaling patterns.Read on Nebius docs ↗

Docs

Soperator docs (Managed Slurm)

Operator-style API for running Slurm clusters on Kubernetes. Launch and manage distributed training jobs without hand-rolling the scheduler stack.Read on Nebius docs ↗

Ready to launch a GPU workload?

Open Console ↗

Build on GPUs your way.

Get hands-on with AI Cloud.

Deploying a Knowledge-Based Chatbot with RAG in Production

Inference guide with vLLM

K8s on Nebius

Nebius AI Cloud Console overview

Watch and learn.

How to run Boltz-2 at scale on Kubernetes

Build a RAG Chatbot to Chat with Your Code with LlamaIndex, DeepSeek v3 & Nebius

Deep dives and reference architectures.

Bare-metal-class performance for AI inference in MLPerf Inference v5.1

Clusters vs single nodes: which to use for training and inference

Data preparation techniques

Fault-tolerant training: how we build reliable clusters for distributed AI

Introducing Managed Soperator: your quick access to Slurm training

MLPerf Training v5.1: leading results on NVIDIA Blackwell and Blackwell Ultra

Orchestrating LLM fine-tuning on K8s with SkyPilot and MLflow

Running Boltz-2 inference at scale in Nebius AI Cloud

Running NVIDIA NIM and NVIDIA Blueprint in Nebius AI Cloud

The role of compute cluster networking for AI training and inference

Using SkyPilot and Kubernetes for multi-node fine-tuning of Llama 3.1

What are GPU clusters and how to choose yours?

Reference implementations.

Kvax: fast Flash Attention for JAX with context parallelism

ML Cookbook: pre-training DeepSeek-V3 with MXFP8 on a B200 cluster

Nebius Kubernetes Applications: Helm charts for GPU & AI workloads

Nebius Solutions Library

SkyPilot — finetuning & orchestrating

vLLM examples (Nebius PS services)

End-to-end Slurm training + vLLM inference demo on Nebius

Docs and reference.

Managed Kubernetes docs

SkyPilot on Nebius docs

Soperator docs (Managed Slurm)

Ready to launch a GPU workload?