Serverless AI

GPU workloads without managing infrastructure.

Two execution models: Jobs run workloads to completion. Endpoints serve real-time requests. Provision compute on demand, execute, release automatically.

Open Console Docs ↗

Quickstarts

Get to know Serverless.

Understand how Jobs and Endpoints work — then run one.

Docs

Jobs

Execute workloads that run and stop.

Training, fine-tuning, batch inference, simulations, and data processing pipelines — packaged as containers, scheduled on GPU.

Docs

Build a RAG chatbot on Serverless AI

End-to-end RAG: JupyterLab for prototyping, Serverless AI for inference, and Managed PostgreSQL for the vector store.Read on Nebius docs ↗

Docs

Deploy a model as a Serverless Endpoint

Step-by-step tutorial for packaging a model, building the container, and exposing it via a Serverless Endpoint.Read on Nebius docs ↗

Docs

Fine-tuning on Serverless Jobs

Use Serverless Jobs to run fine-tuning workloads — data in, model artifact out — without managing the cluster.Read on Nebius docs ↗

Docs

Serverless AI overview

High-level overview of the two Serverless AI execution models — Jobs (run to completion) and Endpoints (real-time APIs).Read on Nebius docs ↗

Docs

Voice and media — TTS pipelines on Serverless

Build pipelines that combine batch processing and inference endpoints for voice synthesis and media workflows.Read on Nebius docs ↗

Core patterns

Reference implementations.

LLM inference, training & fine-tuning, RAG pipelines, agentic workflows (OpenClaw), and life-science workloads — all in the Serverless AI cookbook.

GitHub

Agentic workflows with OpenClaw on Serverless

Run agent-style pipelines with tool use, retrieval, and multi-step reasoning using OpenClaw on Serverless GPU compute.View on GitHub ↗

GitHub

Life sciences workloads on Serverless (OpenMM)

Run molecular simulations and generate datasets on GPU. Recipes for OpenMM and friends, packaged for Serverless Jobs.View on GitHub ↗

GitHub

LLM inference — vLLM endpoint

Serve large language models as real-time APIs using vLLM containers on Serverless GPU endpoints.View on GitHub ↗

GitHub

Serverless Endpoints — first-endpoint examples

Ready-to-deploy example endpoints from the Serverless cookbook. Build, push, and serve.View on GitHub ↗

GitHub

Serverless Jobs — first-job examples

Ready-to-run example workloads for the Serverless Jobs quickstart. Clone, edit, and submit.View on GitHub ↗

GitHub

Training & fine-tuning — Serverless Jobs

Run GPU workloads that produce model artifacts. Recipes for distributed training and fine-tuning via Serverless Jobs.View on GitHub ↗

Video walkthroughs

See it in action.

Video

Serverless AI — full walkthrough

Deep-dive video walkthrough of the Serverless AI model — when to use Jobs vs Endpoints, with real demos.Watch on YouTube ↗

Video

Serverless Endpoints — quickstart video

Five-minute walkthrough of deploying a Serverless Endpoint and hitting it from your app.Watch on YouTube ↗

Video

Serverless Jobs — quickstart video

Five-minute walkthrough of launching your first Serverless Job. Container in, GPU compute out.Watch on YouTube ↗

Launch your first Job or Endpoint.

Open Console ↗

GPU workloads without managing infrastructure.

Get to know Serverless.

Deploy your first Serverless Endpoint

Launch your first Serverless Job

Serverless AI cookbook

Execute workloads that run and stop.

Build a RAG chatbot on Serverless AI

Deploy a model as a Serverless Endpoint

Fine-tuning on Serverless Jobs

Serverless AI overview

Voice and media — TTS pipelines on Serverless

Reference implementations.

Agentic workflows with OpenClaw on Serverless

Life sciences workloads on Serverless (OpenMM)

LLM inference — vLLM endpoint

Serverless Endpoints — first-endpoint examples

Serverless Jobs — first-job examples

Training & fine-tuning — Serverless Jobs

See it in action.

Serverless AI — full walkthrough

Serverless Endpoints — quickstart video

Serverless Jobs — quickstart video

Launch your first Job or Endpoint.