Quickstarts
Get to know Serverless.
Understand how Jobs and Endpoints work — then run one.Docs
Deploy your first Serverless Endpoint
Stand up a long-running inference endpoint with a public URL. Token Factory-style API, custom container.Read on Nebius docs ↗Docs
Launch your first Serverless Job
Run a containerized AI workload to completion on Nebius GPUs without provisioning a cluster.Read on Nebius docs ↗GitHub
Serverless AI cookbook
Top-level repo for every Serverless AI recipe — jobs, endpoints, RAG, agents, life sciences, voice, and more.View on GitHub ↗Jobs
Execute workloads that run and stop.
Training, fine-tuning, batch inference, simulations, and data processing pipelines — packaged as containers, scheduled on GPU.Docs
Build a RAG chatbot on Serverless AI
End-to-end RAG: JupyterLab for prototyping, Serverless AI for inference, and Managed PostgreSQL for the vector store.Read on Nebius docs ↗Docs
Deploy a model as a Serverless Endpoint
Step-by-step tutorial for packaging a model, building the container, and exposing it via a Serverless Endpoint.Read on Nebius docs ↗Docs
Fine-tuning on Serverless Jobs
Use Serverless Jobs to run fine-tuning workloads — data in, model artifact out — without managing the cluster.Read on Nebius docs ↗Docs
Serverless AI overview
High-level overview of the two Serverless AI execution models — Jobs (run to completion) and Endpoints (real-time APIs).Read on Nebius docs ↗Docs
Voice and media — TTS pipelines on Serverless
Build pipelines that combine batch processing and inference endpoints for voice synthesis and media workflows.Read on Nebius docs ↗Core patterns
Reference implementations.
LLM inference, training & fine-tuning, RAG pipelines, agentic workflows (OpenClaw), and life-science workloads — all in the Serverless AI cookbook.GitHub
Agentic workflows with OpenClaw on Serverless
Run agent-style pipelines with tool use, retrieval, and multi-step reasoning using OpenClaw on Serverless GPU compute.View on GitHub ↗GitHub
Life sciences workloads on Serverless (OpenMM)
Run molecular simulations and generate datasets on GPU. Recipes for OpenMM and friends, packaged for Serverless Jobs.View on GitHub ↗GitHub
LLM inference — vLLM endpoint
Serve large language models as real-time APIs using vLLM containers on Serverless GPU endpoints.View on GitHub ↗GitHub
Serverless Endpoints — first-endpoint examples
Ready-to-deploy example endpoints from the Serverless cookbook. Build, push, and serve.View on GitHub ↗GitHub
Serverless Jobs — first-job examples
Ready-to-run example workloads for the Serverless Jobs quickstart. Clone, edit, and submit.View on GitHub ↗GitHub
Training & fine-tuning — Serverless Jobs
Run GPU workloads that produce model artifacts. Recipes for distributed training and fine-tuning via Serverless Jobs.View on GitHub ↗Video walkthroughs
See it in action.
Video
Serverless AI — full walkthrough
Deep-dive video walkthrough of the Serverless AI model — when to use Jobs vs Endpoints, with real demos.Watch on YouTube ↗Video
Serverless Endpoints — quickstart video
Five-minute walkthrough of deploying a Serverless Endpoint and hitting it from your app.Watch on YouTube ↗Video