TeeUnit Agent
Trained RL agents for the TeeUnit Environment - an OpenEnv-compatible Teeworlds arena for LLM-based reinforcement learning.
Environment
- Space: ziadbc/teeunit-env
- GitHub: ziadgit/teeunit
- Game: Teeworlds 0.7.5 arena (simulation mode)
Available Models
Q-Learning Agent (Latest)
- File:
teeunit_qlearning_agent.json/teeunit_qlearning_agent.pkl - Algorithm: Tabular Q-Learning
- Training: 20 episodes, 938 steps
- Total Reward: 39.38
Actions
The agent can perform 7 actions:
| Action | Description |
|---|---|
move left |
Move character left |
move right |
Move character right |
move none |
Stop moving |
jump |
Jump |
shoot pistol |
Fire pistol (weapon 1) |
shoot shotgun |
Fire shotgun (weapon 2) |
hook |
Use grappling hook |
Usage
Load and Use the Agent
import json
import random
# Load model
with open('teeunit_qlearning_agent.json') as f:
model = json.load(f)
q_table = model['q_table']
actions = model['actions']
def get_state_key(status_text):
"""Extract state from game status text."""
lines = status_text.split('\n')
state = []
for line in lines:
if 'Position:' in line:
try:
pos = line.split('(')[1].split(')')[0]
x, y = map(float, pos.split(','))
state.append(f'pos_{int(x//100)}_{int(y//100)}')
except:
state.append('pos_unknown')
if 'Health:' in line:
try:
health = int(line.split(':')[1].split('/')[0].strip())
state.append(f'hp_{health//3}')
except:
pass
if 'units away' in line:
try:
dist = float(line.split(',')[-1].replace('units away', '').strip())
state.append(f'enemy_{"close" if dist < 100 else "mid" if dist < 200 else "far"}')
except:
pass
return str(tuple(sorted(state))) if state else "('default',)"
def choose_action(state_key):
"""Choose best action for given state."""
if state_key in q_table:
q_values = q_table[state_key]
best_action = max(q_values.keys(), key=lambda a: q_values[a])
return int(best_action)
return random.randint(0, len(actions) - 1)
# Example usage
state_key = get_state_key(status_text)
action_idx = choose_action(state_key)
action = actions[action_idx]
print(f"Action: {action['tool']} with args {action['args']}")
Connect to Environment
import asyncio
import websockets
import json
async def play():
uri = 'wss://ziadbc-teeunit-env.hf.space/ws'
async with websockets.connect(uri) as ws:
# Reset environment
await ws.send(json.dumps({'type': 'reset', 'data': {}}))
await ws.recv()
# Get status
await ws.send(json.dumps({
'type': 'step',
'data': {'type': 'call_tool', 'tool_name': 'get_status', 'arguments': {}}
}))
resp = json.loads(await ws.recv())
status = resp['data']['observation']['result']['data']
# Choose and execute action
state_key = get_state_key(status)
action = actions[choose_action(state_key)]
await ws.send(json.dumps({
'type': 'step',
'data': {'type': 'call_tool', 'tool_name': action['tool'], 'arguments': action['args']}
}))
resp = json.loads(await ws.recv())
reward = resp['data']['reward']
print(f"Reward: {reward}")
asyncio.run(play())
Training Your Own Agent
See the Colab notebook for training examples using:
- Q-Learning (tabular)
- Stable Baselines3 (PPO, A2C)
- Unsloth/TRL (LLM fine-tuning)
Environment API
The TeeUnit environment exposes these MCP tools:
| Tool | Arguments | Description |
|---|---|---|
move |
direction: "left"|"right"|"none" |
Move horizontally |
jump |
- | Jump (can double-jump) |
aim |
x: int, y: int |
Aim at coordinates |
shoot |
weapon: 0-5 |
Fire weapon |
hook |
- | Toggle grappling hook |
get_status |
- | Get game state as text |
License
MIT License - See GitHub repo for details.
Evaluation results
- Total Reward (20 episodes) on TeeUnit Environmentself-reported39.380