Token Factory

LLM Chess Tournament

Benchmark LLMs at chess using Nebius AI Studio models

About this project

A benchmarking harness that runs a chess tournament between large language models and the Stockfish engine, ranking each model's tactical play with Elo ratings. It is built with DSPy and requires models from Nebius AI Studio.

View on GitHub ↗

Technologies

app

benchmark

★ 9 stars on GitHub