Quickstarts
Build, customize, and deploy.
Playlist
Official Token Factory playlist on YouTube
Learn about Nebius Token Factory from the makers — architecture, deep dives, and live demos curated by the team.Watch on YouTube ↗Docs
Post-training guide
Fine-tune open models on Token Factory: data prep, training runs, evaluation, and deployment back to the inference API.Read on Token Factory docs ↗GitHub
Token Factory cookbook
Reference recipes for Token Factory — inference patterns, fine-tuning runs, evals, and production deployment recipes.View on GitHub ↗Docs
Token Factory quickstart
OpenAI-compatible API to start fast. Set up an API key, hit the inference endpoint, and ship your first request in under five minutes.Read on Token Factory docs ↗Workshop
Workshop: Build an Agentic Slack Bot
Deploy a web-connected AI agent in under 15 minutes. Build a Slack Pricing Assistant that searches competitor pricing with Tavily, runs inference through Nebius Token Factory, and returns structured recommendations. With Colin Lowenberg (Nebius) + Lakshya Agarwal (Tavily).Watch on Nebius.com ↗YouTube
Video walkthroughs.
Official Token Factory playlist on YouTube — architecture, deep dives, and live demos from the team — plus a curated Build-with-Token-Factory series.Videos & Workshops
Learn from live builds.
Video
Exploring Nebius Token Factory | Open LLMs, AI Agents, Batch Inference & Fine-Tuning
A tour of Nebius Token Factory covering its OpenAI-compatible API, batch inference, fine-tuning, and how open models like Qwen Coder plug into agent frameworks and tools like Hugging Face and OpenRouter.Watch on YouTube ↗Video
How to Fine-Tune GPT-OSS 20B using Nebius Token Factory
Hands-on tutorial fine-tuning the GPT-OSS 20B model on Nebius Token Factory: environment setup, JSONL dataset upload, creating a fine-tuning job, monitoring training, and downloading artifacts.Watch on YouTube ↗Video
How to Fine-Tune Open Source LLMs with Nebius Token Factory | Full Tutorial
End-to-end walkthrough of fine-tuning an open-source LLM with LoRA on Nebius Token Factory and deploying it as a production-ready API endpoint, including dataset prep and job configuration.Watch on YouTube ↗Docs & reference
Integrate with your stack.
API reference, framework integrations (LangChain, LlamaIndex), agent frameworks (Agno, CrewAI, Pydantic AI), and post-training guides.Docs
Agno integration
Lightweight multi-modal agent framework with Token Factory as a first-class provider.Read on Token Factory docs ↗Docs
aisuite integration
Use Token Factory as a model backend in aisuite, the model-agnostic Python SDK from Andrew Ng.Read on Token Factory docs ↗Docs
Autoscaling and cache-aware routing
How Token Factory scales inference workloads and routes requests based on cache locality. Rate limits, burst behavior, and tuning guidance.Read on Token Factory docs ↗Docs
Batch inference
Submit large request batches asynchronously for offline workloads. Cheaper per-token than real-time inference; ideal for evals and back-fills.Read on Token Factory docs ↗Docs
CrewAI integration
Build multi-agent crews with Token Factory models. Coordination, role-playing, and task pipelines all wired through Nebius inference.Read on Token Factory docs ↗Docs
Dedicated endpoints
Pin a Token Factory model to dedicated GPU capacity for predictable latency at scale. Includes autoscaling rules and routing patterns.Read on Token Factory docs ↗Docs
Deploy custom models
Take a model you trained anywhere and deploy it behind a Token Factory endpoint. Custom weight loading, scaling, and monitoring.Read on Token Factory docs ↗Docs
Function calling and tools with Token Factory
Official Token Factory guide on defining tools, letting models pick functions from context (or forcing a specific call), with Python and JavaScript examples for building tool-using agents.Read on Token Factory docs ↗Docs
LangChain integration
Use Token Factory chat models, embeddings, and retrievers inside LangChain via the langchain-nebius package.Read on Token Factory docs ↗Docs
LiteLLM integration
Route LiteLLM through Token Factory as an OpenAI-compatible provider. Drop-in for projects already using the LiteLLM proxy.Read on Token Factory docs ↗Docs
LlamaIndex integration
Wire Token Factory in as the inference layer for LlamaIndex RAG pipelines.Read on Token Factory docs ↗Docs
Pydantic AI integration
Type-safe agent framework with Pydantic validation. Token Factory backs the inference layer.Read on Token Factory docs ↗Docs
Structured output and JSON mode with Token Factory
Official Token Factory docs showing how to force JSON responses via json_object mode or a strict JSON schema (e.g. a Pydantic BaseModel), with Python, cURL and JavaScript samples.Read on Token Factory docs ↗Docs
Switch to Token Factory
Migrate from OpenAI / other inference providers to Token Factory. Drop-in compatibility plus the cost and rate-limit upgrades that come with it.Read on Token Factory docs ↗Docs
Token Factory API reference
Full reference for the Token Factory inference API — chat completions, embeddings, fine-tuning, and batch endpoints.Read on Token Factory docs ↗Docs
Token Factory playground
Interactive playground for trying open models without writing code. Live-edit prompts, swap models, and copy the request as curl/JS/Python.Read on Nebius ↗In-depth technical resources
Production inference is more than serving a model.
Architecture breakdowns for routing, MoE latency, speculative decoding, chat app design, and more.Guide
Building an AI-Powered Finance Planner with Full-Stack Next.js and Nebius
Step-by-step build of Money-Guard, a Next.js dashboard that analyzes spending and answers questions about transactions using Meta Llama 3.1 70B served by Nebius AI Studio.Read on Nebius ↗Guide
Create Your Own AI-Powered Code Generator and Reviewer
Build a full-stack Next.js code assistant that generates snippets across languages and returns automated reviews, powered by DeepSeek Coder on Nebius Token Factory.Read on Nebius ↗Guide
How to Run Meta Llama 3.1 405B with the Nebius AI Studio API
A hands-on how-to for calling Meta Llama 3.1 405B through the Nebius OpenAI-compatible API, with working Python, JavaScript, and cURL examples.Read on Nebius ↗Guide
Routing in LLM inference is the difference between scaling and stalling
Why request routing is the single biggest lever in production LLM inference, and how Token Factory routes intelligently.Read on Nebius ↗Guide
The invisible architecture behind great chat apps
What separates a usable chat product from a janky one — caching, routing, streaming, and the production patterns that hide the seams.Read on Nebius ↗Guide
Why large MoE models break latency budgets — and what speculative decoding changes
Production analysis of how speculative decoding alters latency for large mixture-of-experts inference workloads.Read on Nebius ↗Guide
Adding Nebius Token Factory to a Rust Agent Without a Custom Provider
Tutorial showing how to use Rig's OpenAI-compatible base_url override to wire Nebius Token Factory into a Rust LLM agent — no custom provider code needed.Read on Rup12 ↗Guide
Build a Job-Finding Agent with Google ADK, Nebius AI, Mistral OCR & Linkup
A multi-agent pipeline that reads resume PDFs with Mistral OCR, searches live job boards via Linkup, and uses Qwen3-14B on Nebius AI Studio to generate and filter matches, orchestrated with Google ADK.Read on DEV ↗Guide
Building a Multi-Agent RAG System with Couchbase, CrewAI, and Nebius AI Studio
Build a semantic search engine that pairs Couchbase as the vector store with CrewAI multi-agent RAG, using Nebius AI Studio for both the Llama LLM and the e5-mistral embeddings.Read on DEV ↗Guide
Fine-Tune Your LLM in Minutes with Nebius
A practical guide to fine-tuning open-source LLMs on Nebius three ways: the no-code Web Console, the Python SDK, and raw cURL API requests, with .jsonl dataset prep.Read on DEV ↗Guide
How I Built an Agentic RAG App to Brainstorm Conference Talk Ideas
Combine Tavily live web research, Couchbase vector search over past KubeCon talks, and Nebius AI Studio (e5-mistral embeddings + Qwen3) to synthesize unique conference talk abstracts.Read on DEV ↗Guide
I Built a Team of 5 Agents Using Google ADK, Meta Llama and Nemotron-Ultra-253B
Build an AI Trend Analyzer with five sequential ADK agents (Exa, Tavily, Firecrawl + summary/analysis) running Meta Llama 3.1 and Nemotron-Ultra-253B served through Nebius AI Studio.Read on DEV ↗Guide
I Used Agent Skills to Fine-Tune an Open-Source LLM on Nebius Token Factory
A teacher-student distillation walkthrough for an insurance-claims chatbot, using Token Factory Data Lab batch inference, LoRA fine-tuning, serverless adapter deployment, and a Gradio comparison app. Ships with a companion Jupyter notebook.Read on Medium ↗Guide
Text-to-SQL: Creating Embeddings with Nebius AI Studio (Part 1)
Part 1 of a text-to-SQL RAG series: turn SQL schema into annotated markdown and generate vector embeddings with Nebius AI Studio (BAAI/bge-en-icl), stored in Postgres with pgvector.Read on DEV ↗Guide
Text-to-SQL: Generating SQL with Nebius AI Studio (Part 2)
Part 2 of the text-to-SQL RAG series: use the embeddings from Part 1 to retrieve relevant schema and generate correct SQL queries with Nebius AI Studio models.Read on DEV ↗Guide
Text-to-SQL: Querying Databases with Nebius AI Studio and Agents (Part 3)
Part 3 of the text-to-SQL RAG series: wrap the pipeline in an agent that queries a live database end to end, powered by Nebius AI Studio models.Read on DEV ↗Guide
Use DeepSeek R1 & V3 with Bolt.DIY & Cursor in 3 Steps
Get free Nebius AI Studio API keys, route DeepSeek R1/V3 through OpenRouter, and plug the EU-hosted models into Bolt.DIY and Cursor for coding.Read on DEV ↗Guide