Game of 21 AI - SFT Model

Model Description

This is the Supervised Fine-Tuned (SFT) model for the Game of 21 AI project. It was trained using LoRA (Low-Rank Adaptation) on optimal Game of 21 gameplay examples with distilgpt2 as the base model.

Model Type: SFT (Supervised Fine-Tuning)

This model was trained to play the strategic Game of 21, where:

  • Two players start at 0
  • On each turn, a player adds 1, 2, 3, or 4 to the running total
  • The player who reaches 21 or more loses
  • The optimal strategy is to land on multiples of 5 (5, 10, 15, 20)

Training Process

SFT Training

This model was trained using supervised fine-tuning:

  • Training: Supervised Fine-Tuning with LoRA
  • Dataset: Training examples from optimal game play (debug mode uses subset)
  • Strategy: Learned basic game strategy from examples
  • Performance: ~40-60% win rate against optimal opponent
  • Training Time: ~2-3 hours on Kaggle T4 GPU
  • Checkpoints: Saved every 50 steps with auto-resume capability

Performance Metrics

  • Win Rate: ~40-60% against optimal opponent
  • Average Game Length: ~4.2 turns
  • Invalid Moves: Reduced invalid moves compared to baseline
  • Strategy: Learned basic game strategy from training examples
  • Training: Supervised Fine-Tuning with LoRA on game examples

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("distilgpt2")
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")

# Load SFT adapter
model = PeftModel.from_pretrained(base_model, "araviiiman/game21-sft")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Game prompt
prompt = """
Game Rules: Two players start at 0. On each turn, add 1, 2, 3, or 4 to the running total. 
The player who reaches 21 or more loses.

Current total: 5
What number (1, 2, 3, or 4) do you add?
I will add: """

# Generate response (using direct tokenization for distilgpt2 compatibility)
input_text = f"System: You are a strategic game player.\nUser: {prompt}"
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(input_ids.input_ids, max_new_tokens=5, temperature=0.7, do_sample=False)
response = tokenizer.decode(outputs[0][input_ids.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Model Architecture

  • Base Model: distilgpt2
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • LoRA Parameters: r=8, alpha=16, dropout=0.05
  • Target Modules: c_attn, c_proj (distilgpt2 specific)
  • Parameters: ~82M base model + adapter weights
  • Quantization: 4-bit quantization for efficient inference

Training Configuration

  • Learning Rate: 5e-5
  • Batch Size: 1 (effective: 8 with gradient accumulation)
  • Epochs: 5
  • Sequence Length: 64 (optimized for game prompts)
  • Optimizer: adamw_torch
  • Mixed Precision: FP16
  • Checkpoints: Every 50 steps (max 3 saved)

Training Data

The model was trained on examples of optimal Game of 21 play, including:

  • Complete game transcripts with optimal moves
  • Win/loss scenarios with proper strategy
  • Strategic decision points at each game state
  • Chat-formatted conversations for instruction following

Limitations

  • Model performance depends on the quality of training data
  • May not perform optimally in all game scenarios
  • Requires proper prompt formatting for best results
  • Single epoch training may need more iterations for optimal performance

Citation

@misc{game21-ai-sft,
  title={Game of 21 AI - SFT Model},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/araviiiman/game21-sft}
}

License

This model is licensed under the MIT License.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for araviiiman/game21-sft

Adapter
(66)
this model