RookWorld-LM-124M
A unified 124M parameter model combining chess policy (ROOK) and environment simulation (Arbiter) in a single transformer, enabling closed-loop self-play without external engines.
Model Details
Model Description
RookWorld-LM is a breakthrough in unified modeling - a single transformer that can both play chess (policy) and simulate the chess environment (world model) through different prompt prefixes.
- Developed by: Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
- Model type: GPT-2 (multi-task autoregressive)
- Language(s): Chess notation, game states, rewards
- License: MIT
- Repository: GitHub
- Paper: LAION Research Note
- Demo: HuggingFace Space
Model Architecture
- Parameters: 124M
- Architecture: GPT-2
- Context Length: up to 2048 tokens
- Multi-task: Policy + Environment in one model
- Training Framework: llm.c
Uses
Direct Use
- Self-play Chess: Complete games without external engine
- Position Analysis: Chess move prediction with reasoning
- Environment Simulation: Next state prediction from moves
- Research: Unified agent-environment modeling
Unique Capabilities
- Closed-loop Self-improvement: Can generate training data for itself
- World Modeling: Predicts game outcomes and state transitions
- Multi-task Performance: Excels at both policy and simulation tasks
Training Details
Training Data
- Datasets:
- rookworld_7m: interleaved ROOK policy and Arbiter environment samples
- rookworld_46m: scaled-up interleaved dataset (per LAION note)
Training interleaves both tasks in mixed batches.
Training Procedure
- Hardware: 2x NVIDIA RTX 4090
- Framework: llm.c (karpathy/llm.c)
- Trained for multiple epochs with llm.c; rookworld_7m results reported at 3 epochs; rookworld_46m at 5 epochs (LAION note)
Evaluation
Policy Performance (ROOK Mode)
- Action accuracy (rookworld_46m, 5 epochs): 26.2% (LAION note)
- BIG-bench Checkmate-in-One: 32.1% (LAION note)
Environment Performance (Arbiter Mode)
Per repository evaluation scripts (RookWorld/README):
- Next State Accuracy: 99.61% (baseline 92.3%)
- State NLS: 99.99% (baseline 99.76%)
- Reward Accuracy: 99.11% (baseline 98.93%)
- Terminated Accuracy: 99.13% (baseline 99.04%)
Technical Details
Prompt-based Task Selection
RookWorld-LM uses task prefixes to switch between policy and environment modes:
Policy Mode (ROOK) - Playing Chess
Format: P: <FEN position>
Example:
# Input (prompt)
P: r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3
# Output (model generation)
M: d2d4 b1c3 f1c4 f1b5 d2d3 E: 0.6 0.5 0.4 0.3 0.2 B: d2d4
The model generates:
- M: Candidate moves in UCI notation
- E: Evaluation scores for each candidate
- B: Best move selection
Environment Mode (Arbiter) - Simulating Chess
Format: A: <current_state>+<action>+<move_history>+
Example:
# Input (prompt with chess state and action)
A: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1+e2e4+e2e4+
# Output (model generation - next state, reward, terminated, truncated)
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1+0+False+False
The model generates:
- Next state: Updated FEN position after the move
- Reward: -1 (loss), 0 (ongoing), or 1 (win)
- Terminated: Whether the game ended
- Truncated: Whether max moves reached
Unified Tokenization
Shared vocabulary across tasks:
- FEN notation tokens
- UCI move notation
- State transition tokens
- Reward tokens (-1, 0, 1)
- Special task prefixes
Self-play Implementation
def self_play_game(model, tokenizer):
"""Complete chess game using RookWorld-LM for both policy and environment"""
state = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
moves = []
history = []
while True:
# Step 1: Get move from policy mode
policy_prompt = f"P: {state} "
policy_output = model.generate(tokenizer(policy_prompt)["input_ids"])
policy_text = tokenizer.decode(policy_output[0])
# Extract best move from "B: <move>" in output
if "B: " in policy_text:
move = policy_text.split("B: ")[1].split()[0]
moves.append(move)
history.append(move)
else:
break # Invalid generation
# Step 2: Update state using environment mode
env_prompt = f"A: {state}+{move}+{' '.join(history[-10:])}+"
env_output = model.generate(tokenizer(env_prompt)["input_ids"])
env_text = tokenizer.decode(env_output[0])
# Parse environment response
parts = env_text.split("+")
if len(parts) >= 3:
next_state = parts[0].replace("A: ", "").strip()
reward = int(parts[1])
terminated = parts[2] == "True"
state = next_state
if terminated:
return moves, reward
else:
break # Invalid environment response
return moves, 0 # Game incomplete
Evolution Capability
RookWorld-LM supports self-improvement through:
- Self-play game generation
- Filtering winning trajectories
- Continued training on successful games
- Iterative performance improvement
See RookWorld Evol for details.
Limitations
- Compute Constraints: No deep search or Monte Carlo methods
- Context Window: Limited to 2048 tokens
- Training Data: Performance bounded by initial training distribution
- Evaluation Depth: Single-step lookahead only
Significance
RookWorld-LM demonstrates:
- Unified Architectures: Single model for multiple strategic tasks
- Emergent Capabilities: World modeling from language model training
- Self-sufficiency: Complete game playing without external tools
- Scalability: Performance improves with model size
Related Models
- ROOK-CLF-9M: Classification approach
- ROOK-LM-124M: Policy-only model
- Arbiter-2M: Environment-only dataset
Citation
@article{rookworld2024,
title={RookWorld: Unified Agent and Environment Modeling for Chess},
author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi},
journal={LAION Research Notes},
year={2024},
url={https://laion.ai/notes/rook/}
}
Model Card Contact
Jonathan Rahn - GitHub | Research Page
Metrics Source
LAION research note: https://laion.ai/notes/rook/
- Downloads last month
- 13