Quickstarts
Get hands-on with AI Cloud.
Workshop
Deploying a Knowledge-Based Chatbot with RAG in Production
Nebius webinar with CSA Boris Popov on shipping a production RAG chatbot on Nebius AI Cloud, powered by NVIDIA H100 GPUs with Kubernetes, Triton, TensorRT, Milvus, and PyTorch.Watch on YouTube ↗Guide
Inference guide with vLLM
Deploy high-throughput LLM inference on Nebius GPU cloud using vLLM and Kubernetes. Includes configuration patterns for scalable model serving.Read on Nebius ↗GitHub
K8s on Nebius
Deploy GPU-enabled Kubernetes clusters with pre-installed NVIDIA drivers and optimized networking for AI training and inference workloads.View on GitHub ↗Video
Nebius AI Cloud Console overview
Learn how to provision GPU infrastructure, manage Kubernetes clusters, launch Slurm jobs, and monitor workloads from the console.Watch on YouTube ↗Videos & Workshops
Watch and learn.
Video
How to run Boltz-2 at scale on Kubernetes
Deploy a Kubernetes cluster, configure shared storage, and run reproducible, multi-node Boltz-2 inference for protein folding.Watch on YouTube ↗Video
Build a RAG Chatbot to Chat with Your Code with LlamaIndex, DeepSeek v3 & Nebius
Step-by-step build of a Retrieval-Augmented Generation chatbot that indexes and queries your codebase using LlamaIndex and the DeepSeek v3 model served on Nebius.Watch on YouTube ↗Resources & guides
Deep dives and reference architectures.
Tutorials, best practices, and production patterns for GPU training, inference, and orchestration on Nebius AI Cloud.Guide
Bare-metal-class performance for AI inference in MLPerf Inference v5.1
Nebius proves near-bare-metal inference performance across three AI systems in MLPerf Inference v5.1, showing how its virtualized GPU stack serves production LLM inference without a meaningful performance tax.Read on Nebius ↗Guide
Clusters vs single nodes: which to use for training and inference
When to fine-tune on one machine versus scaling across many nodes — how PyTorch DDP, DeepSpeed and Horovod handle data partitioning and gradient sync, with Kubernetes/Slurm scheduling, illustrated on Nebius GPU infrastructure.Read on Nebius ↗Guide
Data preparation techniques
Best practices and practical pipelines for preparing large datasets for LLM training and distributed AI workloads on GPU clusters.Read on Nebius ↗Guide
Fault-tolerant training: how we build reliable clusters for distributed AI
Nebius's multi-layered approach to reliable large-scale training — liveness probes, automatic checkpoint-restart on hardware failure, graceful node termination, and the MTBF/MTTR metrics behind a dependable GPU cluster.Read on Nebius ↗Guide
Introducing Managed Soperator: your quick access to Slurm training
Nebius's fully managed Slurm-on-Kubernetes solution is now self-service: spin up a training-ready GPU cluster with pre-installed drivers and libraries in minutes. Explains the Soperator operator, shared root filesystem, and automatic node recovery.Read on Nebius ↗Guide
MLPerf Training v5.1: leading results on NVIDIA Blackwell and Blackwell Ultra
Nebius posted seven first-place finishes in MLPerf Training v5.1 across NVIDIA Blackwell and Blackwell Ultra systems — a look at the hardware-software co-optimization behind training performance on Nebius AI Cloud.Read on Nebius ↗Guide
Orchestrating LLM fine-tuning on K8s with SkyPilot and MLflow
Manage distributed fine-tuning workloads with experiment tracking on Kubernetes GPU clusters.Read on Nebius ↗Guide
Running Boltz-2 inference at scale in Nebius AI Cloud
Scale life-sciences inference workloads on GPU clusters optimized for performance and throughput.Read on Nebius ↗Guide
Running NVIDIA NIM and NVIDIA Blueprint in Nebius AI Cloud
Deploy validated healthcare AI models using NVIDIA NIM on high-performance GPU infrastructure.Read on Nebius ↗Guide
The role of compute cluster networking for AI training and inference
Why high-speed interconnects (InfiniBand, NVIDIA Quantum-2) make or break large-scale AI workloads, and how Nebius AI Cloud builds fast, reliable GPU cluster networking for distributed training and inference.Read on Nebius ↗Guide
Using SkyPilot and Kubernetes for multi-node fine-tuning of Llama 3.1
Step-by-step guide to running distributed LLM fine-tuning on Nebius AI Cloud GPU infrastructure.Read on Nebius ↗Guide
What are GPU clusters and how to choose yours?
A practical primer on GPU compute clusters for training, fine-tuning and inference — covering hardware, orchestration (Kubernetes vs Slurm), InfiniBand networking and shared storage, and how to pick the right configuration on Nebius AI Cloud.Read on Nebius ↗GitHub
Reference implementations.
Drop-in recipes and solutions library repos for K8s, Soperator, SkyPilot, vLLM, and more on Nebius AI Cloud.GitHub
Kvax: fast Flash Attention for JAX with context parallelism
Nebius's open-source Flash Attention 2 implementation for JAX, built on Triton kernels with efficient document-mask computation and context parallelism for FSDP/HSDP-sharded long-sequence training on GPU clusters.View on GitHub ↗GitHub
ML Cookbook: pre-training DeepSeek-V3 with MXFP8 on a B200 cluster
A Nebius ml-cookbook recipe with Slurm job scripts for multi-node pre-training of DeepSeek-V3 (16B and 671B) on a 256-GPU NVIDIA B200 cluster, showing up to 41% faster throughput with MXFP8 mixed precision and DeepEP.View on GitHub ↗GitHub
Nebius Kubernetes Applications: Helm charts for GPU & AI workloads
Official Nebius repo of 50+ ready-to-deploy Helm charts and manifests for Managed Kubernetes — including vLLM inference, Ray cluster/serve, NVIDIA GPU Operator, Stable Diffusion WebUI, MLflow and JupyterHub. A practical reference for deploying AI apps on Nebius mk8s.View on GitHub ↗GitHub
Nebius Solutions Library
Reference architectures and ready-to-deploy recipes for GPU training, inference, K8s, Slurm, and SkyPilot on Nebius AI Cloud.View on GitHub ↗GitHub
SkyPilot — finetuning & orchestrating
Orchestrate multi-node LLM fine-tuning and distributed training across GPU VMs and Kubernetes clusters using SkyPilot.View on GitHub ↗GitHub
vLLM examples (Nebius PS services)
Reference vLLM deployment examples in the Nebius Professional Services repo. Drop-in configs for production inference workloads.View on GitHub ↗GitHub
End-to-end Slurm training + vLLM inference demo on Nebius
A community reference project that provisions a 2-node, 16x H100 Soperator (Slurm-on-Kubernetes) cluster with Terraform, runs SFT and LoRA fine-tuning via sbatch, then serves the model with single- and multi-node vLLM — with reported accuracy gains from 2% to 88%.View on GitHub ↗Documentation
Docs and reference.
Docs
Managed Kubernetes docs
Reference documentation for Nebius Managed Kubernetes — cluster lifecycle, node groups, autoscaling, and GPU configuration.Read on Nebius docs ↗Docs
SkyPilot on Nebius docs
Official docs for the SkyPilot integration on Nebius AI Cloud. Provisioning, multi-node training, and autoscaling patterns.Read on Nebius docs ↗Docs