RookWorld-LM-124M

A unified 124M parameter model combining chess policy (ROOK) and environment simulation (Arbiter) in a single transformer, enabling closed-loop self-play without external engines.

Model Details

Model Description

RookWorld-LM is a breakthrough in unified modeling - a single transformer that can both play chess (policy) and simulate the chess environment (world model) through different prompt prefixes.

  • Developed by: Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
  • Model type: GPT-2 (multi-task autoregressive)
  • Language(s): Chess notation, game states, rewards
  • License: MIT
  • Repository: GitHub
  • Paper: LAION Research Note
  • Demo: HuggingFace Space

Model Architecture

  • Parameters: 124M
  • Architecture: GPT-2
  • Context Length: up to 2048 tokens
  • Multi-task: Policy + Environment in one model
  • Training Framework: llm.c

Uses

Direct Use

  • Self-play Chess: Complete games without external engine
  • Position Analysis: Chess move prediction with reasoning
  • Environment Simulation: Next state prediction from moves
  • Research: Unified agent-environment modeling

Unique Capabilities

  • Closed-loop Self-improvement: Can generate training data for itself
  • World Modeling: Predicts game outcomes and state transitions
  • Multi-task Performance: Excels at both policy and simulation tasks

Training Details

Training Data

  • Datasets:
    • rookworld_7m: interleaved ROOK policy and Arbiter environment samples
    • rookworld_46m: scaled-up interleaved dataset (per LAION note)

Training interleaves both tasks in mixed batches.

Training Procedure

  • Hardware: 2x NVIDIA RTX 4090
  • Framework: llm.c (karpathy/llm.c)
  • Trained for multiple epochs with llm.c; rookworld_7m results reported at 3 epochs; rookworld_46m at 5 epochs (LAION note)

Evaluation

Policy Performance (ROOK Mode)

  • Action accuracy (rookworld_46m, 5 epochs): 26.2% (LAION note)
  • BIG-bench Checkmate-in-One: 32.1% (LAION note)

Environment Performance (Arbiter Mode)

Per repository evaluation scripts (RookWorld/README):

  • Next State Accuracy: 99.61% (baseline 92.3%)
  • State NLS: 99.99% (baseline 99.76%)
  • Reward Accuracy: 99.11% (baseline 98.93%)
  • Terminated Accuracy: 99.13% (baseline 99.04%)

Technical Details

Prompt-based Task Selection

RookWorld-LM uses task prefixes to switch between policy and environment modes:

Policy Mode (ROOK) - Playing Chess

Format: P: <FEN position>

Example:

# Input (prompt)
P: r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3

# Output (model generation)
M: d2d4 b1c3 f1c4 f1b5 d2d3 E: 0.6 0.5 0.4 0.3 0.2 B: d2d4

The model generates:

  • M: Candidate moves in UCI notation
  • E: Evaluation scores for each candidate
  • B: Best move selection

Environment Mode (Arbiter) - Simulating Chess

Format: A: <current_state>+<action>+<move_history>+

Example:

# Input (prompt with chess state and action)
A: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1+e2e4+e2e4+

# Output (model generation - next state, reward, terminated, truncated)
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1+0+False+False

The model generates:

  • Next state: Updated FEN position after the move
  • Reward: -1 (loss), 0 (ongoing), or 1 (win)
  • Terminated: Whether the game ended
  • Truncated: Whether max moves reached

Unified Tokenization

Shared vocabulary across tasks:

  • FEN notation tokens
  • UCI move notation
  • State transition tokens
  • Reward tokens (-1, 0, 1)
  • Special task prefixes

Self-play Implementation

def self_play_game(model, tokenizer):
    """Complete chess game using RookWorld-LM for both policy and environment"""
    state = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
    moves = []
    history = []

    while True:
        # Step 1: Get move from policy mode
        policy_prompt = f"P: {state} "
        policy_output = model.generate(tokenizer(policy_prompt)["input_ids"])
        policy_text = tokenizer.decode(policy_output[0])

        # Extract best move from "B: <move>" in output
        if "B: " in policy_text:
            move = policy_text.split("B: ")[1].split()[0]
            moves.append(move)
            history.append(move)
        else:
            break  # Invalid generation

        # Step 2: Update state using environment mode
        env_prompt = f"A: {state}+{move}+{' '.join(history[-10:])}+"
        env_output = model.generate(tokenizer(env_prompt)["input_ids"])
        env_text = tokenizer.decode(env_output[0])

        # Parse environment response
        parts = env_text.split("+")
        if len(parts) >= 3:
            next_state = parts[0].replace("A: ", "").strip()
            reward = int(parts[1])
            terminated = parts[2] == "True"

            state = next_state
            if terminated:
                return moves, reward
        else:
            break  # Invalid environment response

    return moves, 0  # Game incomplete

Evolution Capability

RookWorld-LM supports self-improvement through:

  1. Self-play game generation
  2. Filtering winning trajectories
  3. Continued training on successful games
  4. Iterative performance improvement

See RookWorld Evol for details.

Limitations

  • Compute Constraints: No deep search or Monte Carlo methods
  • Context Window: Limited to 2048 tokens
  • Training Data: Performance bounded by initial training distribution
  • Evaluation Depth: Single-step lookahead only

Significance

RookWorld-LM demonstrates:

  • Unified Architectures: Single model for multiple strategic tasks
  • Emergent Capabilities: World modeling from language model training
  • Self-sufficiency: Complete game playing without external tools
  • Scalability: Performance improves with model size

Related Models

Citation

@article{rookworld2024,
  title={RookWorld: Unified Agent and Environment Modeling for Chess},
  author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi},
  journal={LAION Research Notes},
  year={2024},
  url={https://laion.ai/notes/rook/}
}

Model Card Contact

Jonathan Rahn - GitHub | Research Page

Metrics Source

LAION research note: https://laion.ai/notes/rook/

Downloads last month
13
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train jrahn/RookWorld-LM-124M

Space using jrahn/RookWorld-LM-124M 1

Collection including jrahn/RookWorld-LM-124M