← Library
Batch inference
Submit large request batches asynchronously for offline workloads. Cheaper per-token than real-time inference; ideal for evals and back-fills.tokenfactory
The full write-up lives on the original source — use the link above to read it.