SoperatorAI Cloud

Nebius Slurm ML Training and Inference Demo

Distributed LLM fine-tuning and vLLM serving on a Soperator Slurm-on-Kubernetes cluster

About this project

An end-to-end demo that fine-tunes Qwen3-8B and Qwen3-32B across 16 NVIDIA H100 GPUs on a Nebius Cloud cluster, then serves the models with vLLM. The cluster is deployed via Terraform using the Soperator (Slurm-on-Kubernetes) template from the Nebius Solutions Library, with InfiniBand GPUDirect RDMA interconnect.

View on GitHub ↗

Technologies

finetuning

infra