Game of 21 AI - SFT Model

Model Description

This is the Supervised Fine-Tuned (SFT) model for the Game of 21 AI project. It was trained using LoRA (Low-Rank Adaptation) on optimal Game of 21 gameplay examples with distilgpt2 as the base model.

Model Type: SFT (Supervised Fine-Tuning)

This model was trained to play the strategic Game of 21, where:

Two players start at 0
On each turn, a player adds 1, 2, 3, or 4 to the running total
The player who reaches 21 or more loses
The optimal strategy is to land on multiples of 5 (5, 10, 15, 20)

Training Process

SFT Training

This model was trained using supervised fine-tuning:

Training: Supervised Fine-Tuning with LoRA
Dataset: Training examples from optimal game play (debug mode uses subset)
Strategy: Learned basic game strategy from examples
Performance: ~40-60% win rate against optimal opponent
Training Time: ~2-3 hours on Kaggle T4 GPU
Checkpoints: Saved every 50 steps with auto-resume capability

Performance Metrics

Win Rate: ~40-60% against optimal opponent
Average Game Length: ~4.2 turns
Invalid Moves: Reduced invalid moves compared to baseline
Strategy: Learned basic game strategy from training examples
Training: Supervised Fine-Tuning with LoRA on game examples

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("distilgpt2")
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")

# Load SFT adapter
model = PeftModel.from_pretrained(base_model, "araviiiman/game21-sft")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Game prompt
prompt = """
Game Rules: Two players start at 0. On each turn, add 1, 2, 3, or 4 to the running total. 
The player who reaches 21 or more loses.

Current total: 5
What number (1, 2, 3, or 4) do you add?
I will add: """

# Generate response (using direct tokenization for distilgpt2 compatibility)
input_text = f"System: You are a strategic game player.\nUser: {prompt}"
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(input_ids.input_ids, max_new_tokens=5, temperature=0.7, do_sample=False)
response = tokenizer.decode(outputs[0][input_ids.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Model Architecture

Base Model: distilgpt2
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Parameters: r=8, alpha=16, dropout=0.05
Target Modules: c_attn, c_proj (distilgpt2 specific)
Parameters: ~82M base model + adapter weights
Quantization: 4-bit quantization for efficient inference

Training Configuration

Learning Rate: 5e-5
Batch Size: 1 (effective: 8 with gradient accumulation)
Epochs: 5
Sequence Length: 64 (optimized for game prompts)
Optimizer: adamw_torch
Mixed Precision: FP16
Checkpoints: Every 50 steps (max 3 saved)

Training Data

The model was trained on examples of optimal Game of 21 play, including:

Complete game transcripts with optimal moves
Win/loss scenarios with proper strategy
Strategic decision points at each game state
Chat-formatted conversations for instruction following

Limitations

Model performance depends on the quality of training data
May not perform optimally in all game scenarios
Requires proper prompt formatting for best results
Single epoch training may need more iterations for optimal performance

Citation

@misc{game21-ai-sft,
  title={Game of 21 AI - SFT Model},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/araviiiman/game21-sft}
}

License

This model is licensed under the MIT License.

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for araviiiman/game21-sft

Base model

distilbert/distilgpt2

Adapter

(66)

this model