--- library_name: transformers pipeline_tag: text-generation tags: - chess - llm.c - world-model - multi-task - strategic-reasoning license: mit language: - en datasets: - jrahn/rookworld_7m metrics: - accuracy widget: - text: "P: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 " - text: "A: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1+e2e4+e2e4+" --- # RookWorld-LM-124M A unified 124M parameter model combining chess policy (ROOK) and environment simulation (Arbiter) in a single transformer, enabling closed-loop self-play without external engines. ## Model Details ### Model Description RookWorld-LM is a breakthrough in unified modeling - a single transformer that can both play chess (policy) and simulate the chess environment (world model) through different prompt prefixes. - **Developed by:** Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI) - **Model type:** GPT-2 (multi-task autoregressive) - **Language(s):** Chess notation, game states, rewards - **License:** MIT - **Repository:** [GitHub](https://github.com/jorahn/RookWorld) - **Paper:** [LAION Research Note](https://laion.ai/notes/rook/) - **Demo:** [HuggingFace Space](https://huggingface.co/spaces/jrahn/rookworld) ### Model Architecture - **Parameters:** 124M - **Architecture:** GPT-2 - **Context Length:** up to 2048 tokens - **Multi-task:** Policy + Environment in one model - **Training Framework:** llm.c ## Uses ### Direct Use - **Self-play Chess:** Complete games without external engine - **Position Analysis:** Chess move prediction with reasoning - **Environment Simulation:** Next state prediction from moves - **Research:** Unified agent-environment modeling ### Unique Capabilities - **Closed-loop Self-improvement:** Can generate training data for itself - **World Modeling:** Predicts game outcomes and state transitions - **Multi-task Performance:** Excels at both policy and simulation tasks ## Training Details ### Training Data - **Datasets:** - [rookworld_7m](https://huggingface.co/datasets/jrahn/rookworld_7m): interleaved ROOK policy and Arbiter environment samples - rookworld_46m: scaled-up interleaved dataset (per LAION note) Training interleaves both tasks in mixed batches. ### Training Procedure - **Hardware:** 2x NVIDIA RTX 4090 - **Framework:** llm.c (karpathy/llm.c) - Trained for multiple epochs with llm.c; rookworld_7m results reported at 3 epochs; rookworld_46m at 5 epochs (LAION note) ## Evaluation ### Policy Performance (ROOK Mode) - **Action accuracy (rookworld_46m, 5 epochs):** 26.2% (LAION note) - **BIG-bench Checkmate-in-One:** 32.1% (LAION note) ### Environment Performance (Arbiter Mode) Per repository evaluation scripts (RookWorld/README): - **Next State Accuracy:** 99.61% (baseline 92.3%) - **State NLS:** 99.99% (baseline 99.76%) - **Reward Accuracy:** 99.11% (baseline 98.93%) - **Terminated Accuracy:** 99.13% (baseline 99.04%) ## Technical Details ### Prompt-based Task Selection RookWorld-LM uses task prefixes to switch between policy and environment modes: #### **Policy Mode (ROOK) - Playing Chess** Format: `P: ` **Example:** ``` # Input (prompt) P: r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3 # Output (model generation) M: d2d4 b1c3 f1c4 f1b5 d2d3 E: 0.6 0.5 0.4 0.3 0.2 B: d2d4 ``` The model generates: - **M:** Candidate moves in UCI notation - **E:** Evaluation scores for each candidate - **B:** Best move selection #### **Environment Mode (Arbiter) - Simulating Chess** Format: `A: +++` **Example:** ``` # Input (prompt with chess state and action) A: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1+e2e4+e2e4+ # Output (model generation - next state, reward, terminated, truncated) rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1+0+False+False ``` The model generates: - **Next state:** Updated FEN position after the move - **Reward:** -1 (loss), 0 (ongoing), or 1 (win) - **Terminated:** Whether the game ended - **Truncated:** Whether max moves reached ### Unified Tokenization Shared vocabulary across tasks: - FEN notation tokens - UCI move notation - State transition tokens - Reward tokens (-1, 0, 1) - Special task prefixes ### Self-play Implementation ```python def self_play_game(model, tokenizer): """Complete chess game using RookWorld-LM for both policy and environment""" state = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" moves = [] history = [] while True: # Step 1: Get move from policy mode policy_prompt = f"P: {state} " policy_output = model.generate(tokenizer(policy_prompt)["input_ids"]) policy_text = tokenizer.decode(policy_output[0]) # Extract best move from "B: " in output if "B: " in policy_text: move = policy_text.split("B: ")[1].split()[0] moves.append(move) history.append(move) else: break # Invalid generation # Step 2: Update state using environment mode env_prompt = f"A: {state}+{move}+{' '.join(history[-10:])}+" env_output = model.generate(tokenizer(env_prompt)["input_ids"]) env_text = tokenizer.decode(env_output[0]) # Parse environment response parts = env_text.split("+") if len(parts) >= 3: next_state = parts[0].replace("A: ", "").strip() reward = int(parts[1]) terminated = parts[2] == "True" state = next_state if terminated: return moves, reward else: break # Invalid environment response return moves, 0 # Game incomplete ``` ## Evolution Capability RookWorld-LM supports self-improvement through: 1. Self-play game generation 2. Filtering winning trajectories 3. Continued training on successful games 4. Iterative performance improvement See [RookWorld Evol](https://github.com/jorahn/RookWorld#rookworld-evol) for details. ## Limitations - **Compute Constraints:** No deep search or Monte Carlo methods - **Context Window:** Limited to 2048 tokens - **Training Data:** Performance bounded by initial training distribution - **Evaluation Depth:** Single-step lookahead only ## Significance RookWorld-LM demonstrates: - **Unified Architectures:** Single model for multiple strategic tasks - **Emergent Capabilities:** World modeling from language model training - **Self-sufficiency:** Complete game playing without external tools - **Scalability:** Performance improves with model size ## Related Models - **[ROOK-CLF-9M](https://huggingface.co/jrahn/ROOK-CLF-9m):** Classification approach - **[ROOK-LM-124M](https://huggingface.co/jrahn/ROOK-LM-124M):** Policy-only model - **[Arbiter-2M](https://huggingface.co/datasets/jrahn/arbiter_2m):** Environment-only dataset ## Citation ```bibtex @article{rookworld2024, title={RookWorld: Unified Agent and Environment Modeling for Chess}, author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi}, journal={LAION Research Notes}, year={2024}, url={https://laion.ai/notes/rook/} } ``` ## Model Card Contact Jonathan Rahn - [GitHub](https://github.com/jorahn) | [Research Page](https://jorahn.github.io/research/) ## Metrics Source LAION research note: https://laion.ai/notes/rook/