---
library_name: transformers
pipeline_tag: text-generation
tags:
  - chess
  - llm.c
  - world-model
  - multi-task
  - strategic-reasoning
license: mit
language:
  - en
datasets:
  - jrahn/rookworld_7m
metrics:
  - accuracy
widget:
  - text: "P: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 "
  - text: "A: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1+e2e4+e2e4+"
---

# RookWorld-LM-124M

A unified 124M parameter model combining chess policy (ROOK) and environment simulation (Arbiter) in a single transformer, enabling closed-loop self-play without external engines.

## Model Details

### Model Description

RookWorld-LM is a breakthrough in unified modeling - a single transformer that can both play chess (policy) and simulate the chess environment (world model) through different prompt prefixes.

- **Developed by:** Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
- **Model type:** GPT-2 (multi-task autoregressive)
- **Language(s):** Chess notation, game states, rewards
- **License:** MIT
- **Repository:** [GitHub](https://github.com/jorahn/RookWorld)
- **Paper:** [LAION Research Note](https://laion.ai/notes/rook/)
- **Demo:** [HuggingFace Space](https://huggingface.co/spaces/jrahn/rookworld)

### Model Architecture

- **Parameters:** 124M
- **Architecture:** GPT-2
- **Context Length:** up to 2048 tokens
- **Multi-task:** Policy + Environment in one model
- **Training Framework:** llm.c

## Uses

### Direct Use

- **Self-play Chess:** Complete games without external engine
- **Position Analysis:** Chess move prediction with reasoning
- **Environment Simulation:** Next state prediction from moves
- **Research:** Unified agent-environment modeling

### Unique Capabilities

- **Closed-loop Self-improvement:** Can generate training data for itself
- **World Modeling:** Predicts game outcomes and state transitions
- **Multi-task Performance:** Excels at both policy and simulation tasks

## Training Details

### Training Data

- **Datasets:**
  - [rookworld_7m](https://huggingface.co/datasets/jrahn/rookworld_7m): interleaved ROOK policy and Arbiter environment samples
  - rookworld_46m: scaled-up interleaved dataset (per LAION note)
  
Training interleaves both tasks in mixed batches.

### Training Procedure

- **Hardware:** 2x NVIDIA RTX 4090
- **Framework:** llm.c (karpathy/llm.c)
- Trained for multiple epochs with llm.c; rookworld_7m results reported at 3 epochs; rookworld_46m at 5 epochs (LAION note)

## Evaluation

### Policy Performance (ROOK Mode)

- **Action accuracy (rookworld_46m, 5 epochs):** 26.2% (LAION note)
- **BIG-bench Checkmate-in-One:** 32.1% (LAION note)

### Environment Performance (Arbiter Mode)

Per repository evaluation scripts (RookWorld/README):

- **Next State Accuracy:** 99.61% (baseline 92.3%)
- **State NLS:** 99.99% (baseline 99.76%)
- **Reward Accuracy:** 99.11% (baseline 98.93%)
- **Terminated Accuracy:** 99.13% (baseline 99.04%)

## Technical Details

### Prompt-based Task Selection

RookWorld-LM uses task prefixes to switch between policy and environment modes:

#### **Policy Mode (ROOK) - Playing Chess**

Format: `P: <FEN position>`

**Example:**
```
# Input (prompt)
P: r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3

# Output (model generation)
M: d2d4 b1c3 f1c4 f1b5 d2d3 E: 0.6 0.5 0.4 0.3 0.2 B: d2d4
```

The model generates:
- **M:** Candidate moves in UCI notation
- **E:** Evaluation scores for each candidate
- **B:** Best move selection

#### **Environment Mode (Arbiter) - Simulating Chess**

Format: `A: <current_state>+<action>+<move_history>+`

**Example:**
```
# Input (prompt with chess state and action)
A: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1+e2e4+e2e4+

# Output (model generation - next state, reward, terminated, truncated)
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1+0+False+False
```

The model generates:
- **Next state:** Updated FEN position after the move
- **Reward:** -1 (loss), 0 (ongoing), or 1 (win)
- **Terminated:** Whether the game ended
- **Truncated:** Whether max moves reached

### Unified Tokenization

Shared vocabulary across tasks:
- FEN notation tokens
- UCI move notation
- State transition tokens
- Reward tokens (-1, 0, 1)
- Special task prefixes

### Self-play Implementation

```python
def self_play_game(model, tokenizer):
    """Complete chess game using RookWorld-LM for both policy and environment"""
    state = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
    moves = []
    history = []

    while True:
        # Step 1: Get move from policy mode
        policy_prompt = f"P: {state} "
        policy_output = model.generate(tokenizer(policy_prompt)["input_ids"])
        policy_text = tokenizer.decode(policy_output[0])

        # Extract best move from "B: <move>" in output
        if "B: " in policy_text:
            move = policy_text.split("B: ")[1].split()[0]
            moves.append(move)
            history.append(move)
        else:
            break  # Invalid generation

        # Step 2: Update state using environment mode
        env_prompt = f"A: {state}+{move}+{' '.join(history[-10:])}+"
        env_output = model.generate(tokenizer(env_prompt)["input_ids"])
        env_text = tokenizer.decode(env_output[0])

        # Parse environment response
        parts = env_text.split("+")
        if len(parts) >= 3:
            next_state = parts[0].replace("A: ", "").strip()
            reward = int(parts[1])
            terminated = parts[2] == "True"

            state = next_state
            if terminated:
                return moves, reward
        else:
            break  # Invalid environment response

    return moves, 0  # Game incomplete
```

## Evolution Capability

RookWorld-LM supports self-improvement through:
1. Self-play game generation
2. Filtering winning trajectories
3. Continued training on successful games
4. Iterative performance improvement

See [RookWorld Evol](https://github.com/jorahn/RookWorld#rookworld-evol) for details.

## Limitations

- **Compute Constraints:** No deep search or Monte Carlo methods
- **Context Window:** Limited to 2048 tokens
- **Training Data:** Performance bounded by initial training distribution
- **Evaluation Depth:** Single-step lookahead only

## Significance

RookWorld-LM demonstrates:
- **Unified Architectures:** Single model for multiple strategic tasks
- **Emergent Capabilities:** World modeling from language model training
- **Self-sufficiency:** Complete game playing without external tools
- **Scalability:** Performance improves with model size

## Related Models

- **[ROOK-CLF-9M](https://huggingface.co/jrahn/ROOK-CLF-9m):** Classification approach
- **[ROOK-LM-124M](https://huggingface.co/jrahn/ROOK-LM-124M):** Policy-only model
- **[Arbiter-2M](https://huggingface.co/datasets/jrahn/arbiter_2m):** Environment-only dataset

## Citation

```bibtex
@article{rookworld2024,
  title={RookWorld: Unified Agent and Environment Modeling for Chess},
  author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi},
  journal={LAION Research Notes},
  year={2024},
  url={https://laion.ai/notes/rook/}
}
```

## Model Card Contact

Jonathan Rahn - [GitHub](https://github.com/jorahn) | [Research Page](https://jorahn.github.io/research/)

## Metrics Source

LAION research note: https://laion.ai/notes/rook/