TeeUnit Agent

Trained RL agents for the TeeUnit Environment - an OpenEnv-compatible Teeworlds arena for LLM-based reinforcement learning.

Environment

Space: ziadbc/teeunit-env
GitHub: ziadgit/teeunit
Game: Teeworlds 0.7.5 arena (simulation mode)

Available Models

Q-Learning Agent (Latest)

File: teeunit_qlearning_agent.json / teeunit_qlearning_agent.pkl
Algorithm: Tabular Q-Learning
Training: 20 episodes, 938 steps
Total Reward: 39.38

Actions

The agent can perform 7 actions:

Action	Description
`move left`	Move character left
`move right`	Move character right
`move none`	Stop moving
`jump`	Jump
`shoot pistol`	Fire pistol (weapon 1)
`shoot shotgun`	Fire shotgun (weapon 2)
`hook`	Use grappling hook

Usage

Load and Use the Agent

import json
import random

# Load model
with open('teeunit_qlearning_agent.json') as f:
    model = json.load(f)

q_table = model['q_table']
actions = model['actions']

def get_state_key(status_text):
    """Extract state from game status text."""
    lines = status_text.split('\n')
    state = []
    for line in lines:
        if 'Position:' in line:
            try:
                pos = line.split('(')[1].split(')')[0]
                x, y = map(float, pos.split(','))
                state.append(f'pos_{int(x//100)}_{int(y//100)}')
            except:
                state.append('pos_unknown')
        if 'Health:' in line:
            try:
                health = int(line.split(':')[1].split('/')[0].strip())
                state.append(f'hp_{health//3}')
            except:
                pass
        if 'units away' in line:
            try:
                dist = float(line.split(',')[-1].replace('units away', '').strip())
                state.append(f'enemy_{"close" if dist < 100 else "mid" if dist < 200 else "far"}')
            except:
                pass
    return str(tuple(sorted(state))) if state else "('default',)"

def choose_action(state_key):
    """Choose best action for given state."""
    if state_key in q_table:
        q_values = q_table[state_key]
        best_action = max(q_values.keys(), key=lambda a: q_values[a])
        return int(best_action)
    return random.randint(0, len(actions) - 1)

# Example usage
state_key = get_state_key(status_text)
action_idx = choose_action(state_key)
action = actions[action_idx]
print(f"Action: {action['tool']} with args {action['args']}")

Connect to Environment

import asyncio
import websockets
import json

async def play():
    uri = 'wss://ziadbc-teeunit-env.hf.space/ws'
    
    async with websockets.connect(uri) as ws:
        # Reset environment
        await ws.send(json.dumps({'type': 'reset', 'data': {}}))
        await ws.recv()
        
        # Get status
        await ws.send(json.dumps({
            'type': 'step',
            'data': {'type': 'call_tool', 'tool_name': 'get_status', 'arguments': {}}
        }))
        resp = json.loads(await ws.recv())
        status = resp['data']['observation']['result']['data']
        
        # Choose and execute action
        state_key = get_state_key(status)
        action = actions[choose_action(state_key)]
        
        await ws.send(json.dumps({
            'type': 'step',
            'data': {'type': 'call_tool', 'tool_name': action['tool'], 'arguments': action['args']}
        }))
        resp = json.loads(await ws.recv())
        reward = resp['data']['reward']
        print(f"Reward: {reward}")

asyncio.run(play())

Training Your Own Agent

See the Colab notebook for training examples using:

Q-Learning (tabular)
Stable Baselines3 (PPO, A2C)
Unsloth/TRL (LLM fine-tuning)

Environment API

The TeeUnit environment exposes these MCP tools:

Tool	Arguments	Description
`move`	`direction: "left"\|"right"\|"none"`	Move horizontally
`jump`	-	Jump (can double-jump)
`aim`	`x: int, y: int`	Aim at coordinates
`shoot`	`weapon: 0-5`	Fire weapon
`hook`	-	Toggle grappling hook
`get_status`	-	Get game state as text

License

MIT License - See GitHub repo for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

Total Reward (20 episodes) on TeeUnit Environment
self-reported

39.380