TeeUnit Agent

Trained RL agents for the TeeUnit Environment - an OpenEnv-compatible Teeworlds arena for LLM-based reinforcement learning.

Environment

Available Models

Q-Learning Agent (Latest)

  • File: teeunit_qlearning_agent.json / teeunit_qlearning_agent.pkl
  • Algorithm: Tabular Q-Learning
  • Training: 20 episodes, 938 steps
  • Total Reward: 39.38

Actions

The agent can perform 7 actions:

Action Description
move left Move character left
move right Move character right
move none Stop moving
jump Jump
shoot pistol Fire pistol (weapon 1)
shoot shotgun Fire shotgun (weapon 2)
hook Use grappling hook

Usage

Load and Use the Agent

import json
import random

# Load model
with open('teeunit_qlearning_agent.json') as f:
    model = json.load(f)

q_table = model['q_table']
actions = model['actions']

def get_state_key(status_text):
    """Extract state from game status text."""
    lines = status_text.split('\n')
    state = []
    for line in lines:
        if 'Position:' in line:
            try:
                pos = line.split('(')[1].split(')')[0]
                x, y = map(float, pos.split(','))
                state.append(f'pos_{int(x//100)}_{int(y//100)}')
            except:
                state.append('pos_unknown')
        if 'Health:' in line:
            try:
                health = int(line.split(':')[1].split('/')[0].strip())
                state.append(f'hp_{health//3}')
            except:
                pass
        if 'units away' in line:
            try:
                dist = float(line.split(',')[-1].replace('units away', '').strip())
                state.append(f'enemy_{"close" if dist < 100 else "mid" if dist < 200 else "far"}')
            except:
                pass
    return str(tuple(sorted(state))) if state else "('default',)"

def choose_action(state_key):
    """Choose best action for given state."""
    if state_key in q_table:
        q_values = q_table[state_key]
        best_action = max(q_values.keys(), key=lambda a: q_values[a])
        return int(best_action)
    return random.randint(0, len(actions) - 1)

# Example usage
state_key = get_state_key(status_text)
action_idx = choose_action(state_key)
action = actions[action_idx]
print(f"Action: {action['tool']} with args {action['args']}")

Connect to Environment

import asyncio
import websockets
import json

async def play():
    uri = 'wss://ziadbc-teeunit-env.hf.space/ws'
    
    async with websockets.connect(uri) as ws:
        # Reset environment
        await ws.send(json.dumps({'type': 'reset', 'data': {}}))
        await ws.recv()
        
        # Get status
        await ws.send(json.dumps({
            'type': 'step',
            'data': {'type': 'call_tool', 'tool_name': 'get_status', 'arguments': {}}
        }))
        resp = json.loads(await ws.recv())
        status = resp['data']['observation']['result']['data']
        
        # Choose and execute action
        state_key = get_state_key(status)
        action = actions[choose_action(state_key)]
        
        await ws.send(json.dumps({
            'type': 'step',
            'data': {'type': 'call_tool', 'tool_name': action['tool'], 'arguments': action['args']}
        }))
        resp = json.loads(await ws.recv())
        reward = resp['data']['reward']
        print(f"Reward: {reward}")

asyncio.run(play())

Training Your Own Agent

See the Colab notebook for training examples using:

  • Q-Learning (tabular)
  • Stable Baselines3 (PPO, A2C)
  • Unsloth/TRL (LLM fine-tuning)

Environment API

The TeeUnit environment exposes these MCP tools:

Tool Arguments Description
move direction: "left"|"right"|"none" Move horizontally
jump - Jump (can double-jump)
aim x: int, y: int Aim at coordinates
shoot weapon: 0-5 Fire weapon
hook - Toggle grappling hook
get_status - Get game state as text

License

MIT License - See GitHub repo for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results

  • Total Reward (20 episodes) on TeeUnit Environment
    self-reported
    39.380