RookWorld-LM-124M

A unified 124M parameter model combining chess policy (ROOK) and environment simulation (Arbiter) in a single transformer, enabling closed-loop self-play without external engines.

Model Details

Model Description

RookWorld-LM is a breakthrough in unified modeling - a single transformer that can both play chess (policy) and simulate the chess environment (world model) through different prompt prefixes.

Developed by: Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
Model type: GPT-2 (multi-task autoregressive)
Language(s): Chess notation, game states, rewards
License: MIT
Repository: GitHub
Paper: LAION Research Note
Demo: HuggingFace Space

Model Architecture

Parameters: 124M
Architecture: GPT-2
Context Length: up to 2048 tokens
Multi-task: Policy + Environment in one model
Training Framework: llm.c

Uses

Direct Use

Self-play Chess: Complete games without external engine
Position Analysis: Chess move prediction with reasoning
Environment Simulation: Next state prediction from moves
Research: Unified agent-environment modeling

Unique Capabilities

Closed-loop Self-improvement: Can generate training data for itself
World Modeling: Predicts game outcomes and state transitions
Multi-task Performance: Excels at both policy and simulation tasks

Training Details

Training Data

Datasets:
- rookworld_7m: interleaved ROOK policy and Arbiter environment samples
- rookworld_46m: scaled-up interleaved dataset (per LAION note)

Training interleaves both tasks in mixed batches.

Training Procedure

Hardware: 2x NVIDIA RTX 4090
Framework: llm.c (karpathy/llm.c)
Trained for multiple epochs with llm.c; rookworld_7m results reported at 3 epochs; rookworld_46m at 5 epochs (LAION note)

Evaluation

Policy Performance (ROOK Mode)

Action accuracy (rookworld_46m, 5 epochs): 26.2% (LAION note)
BIG-bench Checkmate-in-One: 32.1% (LAION note)

Environment Performance (Arbiter Mode)

Per repository evaluation scripts (RookWorld/README):

Next State Accuracy: 99.61% (baseline 92.3%)
State NLS: 99.99% (baseline 99.76%)
Reward Accuracy: 99.11% (baseline 98.93%)
Terminated Accuracy: 99.13% (baseline 99.04%)

Technical Details

Prompt-based Task Selection

RookWorld-LM uses task prefixes to switch between policy and environment modes:

Policy Mode (ROOK) - Playing Chess

Format: P: <FEN position>

Example:

# Input (prompt)
P: r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3

# Output (model generation)
M: d2d4 b1c3 f1c4 f1b5 d2d3 E: 0.6 0.5 0.4 0.3 0.2 B: d2d4

The model generates:

M: Candidate moves in UCI notation
E: Evaluation scores for each candidate
B: Best move selection

Environment Mode (Arbiter) - Simulating Chess

Format: A: <current_state>+<action>+<move_history>+

Example:

# Input (prompt with chess state and action)
A: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1+e2e4+e2e4+

# Output (model generation - next state, reward, terminated, truncated)
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1+0+False+False

The model generates:

Next state: Updated FEN position after the move
Reward: -1 (loss), 0 (ongoing), or 1 (win)
Terminated: Whether the game ended
Truncated: Whether max moves reached

Unified Tokenization

Shared vocabulary across tasks:

FEN notation tokens
UCI move notation
State transition tokens
Reward tokens (-1, 0, 1)
Special task prefixes

Self-play Implementation

def self_play_game(model, tokenizer):
    """Complete chess game using RookWorld-LM for both policy and environment"""
    state = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
    moves = []
    history = []

    while True:
        # Step 1: Get move from policy mode
        policy_prompt = f"P: {state} "
        policy_output = model.generate(tokenizer(policy_prompt)["input_ids"])
        policy_text = tokenizer.decode(policy_output[0])

        # Extract best move from "B: <move>" in output
        if "B: " in policy_text:
            move = policy_text.split("B: ")[1].split()[0]
            moves.append(move)
            history.append(move)
        else:
            break  # Invalid generation

        # Step 2: Update state using environment mode
        env_prompt = f"A: {state}+{move}+{' '.join(history[-10:])}+"
        env_output = model.generate(tokenizer(env_prompt)["input_ids"])
        env_text = tokenizer.decode(env_output[0])

        # Parse environment response
        parts = env_text.split("+")
        if len(parts) >= 3:
            next_state = parts[0].replace("A: ", "").strip()
            reward = int(parts[1])
            terminated = parts[2] == "True"

            state = next_state
            if terminated:
                return moves, reward
        else:
            break  # Invalid environment response

    return moves, 0  # Game incomplete

Evolution Capability

RookWorld-LM supports self-improvement through:

Self-play game generation
Filtering winning trajectories
Continued training on successful games
Iterative performance improvement

See RookWorld Evol for details.

Limitations

Compute Constraints: No deep search or Monte Carlo methods
Context Window: Limited to 2048 tokens
Training Data: Performance bounded by initial training distribution
Evaluation Depth: Single-step lookahead only

Significance

RookWorld-LM demonstrates:

Unified Architectures: Single model for multiple strategic tasks
Emergent Capabilities: World modeling from language model training
Self-sufficiency: Complete game playing without external tools
Scalability: Performance improves with model size

Related Models

ROOK-CLF-9M: Classification approach
ROOK-LM-124M: Policy-only model
Arbiter-2M: Environment-only dataset

Citation

@article{rookworld2024,
  title={RookWorld: Unified Agent and Environment Modeling for Chess},
  author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi},
  journal={LAION Research Notes},
  year={2024},
  url={https://laion.ai/notes/rook/}
}

Model Card Contact

Jonathan Rahn - GitHub | Research Page

Metrics Source

LAION research note: https://laion.ai/notes/rook/

Downloads last month: 13

Safetensors

Model size

0.1B params

Tensor type

F32

Dataset used to train jrahn/RookWorld-LM-124M

Space using jrahn/RookWorld-LM-124M 1

Collection including jrahn/RookWorld-LM-124M

RookWorld & ROOK: Reasoning Over Organized Knowledge

Collection

training language models to reason with a world model: https://laion.ai/notes/rook/ https://jorahn.github.io/research/rook-clf-demo/ • 6 items • Updated Sep 11