⚽ Gemma-3-270M Football Data Extractor

Fine-tuned model for extracting structured data from football/soccer news posts.

🎯 Model Description

This model is a fine-tuned version of google/gemma-3-270m-it specialized in extracting structured information from football news posts, including:

Player transfers
Injury reports
Match summaries
Direct quotes
Statistical data

📊 Training Details

Training Data

Dataset size: 442 training examples, 49 validation examples
Data format: ShareGPT chat format
Content types: Transfer news, injuries, match reports, quotes

Training Configuration

Method: LoRA (Low-Rank Adaptation)
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.1
Epochs: 8
Learning rate: 1.5e-4
Batch size: 2 (per device)
Gradient accumulation: 4
Weight decay: 0.01
Optimizer: AdamW

Training Results

Final train loss: 0.11
Final eval loss: 0.20
Train/eval gap: 0.09 (excellent generalization)
JSON validity: 100% (5/5 test cases)
Entity extraction accuracy: 100%

Best Checkpoint

Checkpoint: 400 steps (selected from 5 candidates)
Selection criteria: Lowest eval loss, best JSON validity

🚀 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "YOUR_USERNAME/gemma-3-270m-football-extractor",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/gemma-3-270m-football-extractor")

# Prepare input
post = "🚨 BREAKING: Manchester United sign Bruno Fernandes for £55m!"

messages = [
    {
        "role": "system",
        "content": "You are a data extraction API. Respond ONLY with JSON."
    },
    {
        "role": "user",
        "content": f"Extract structured data from: {post}"
    }
]

# Generate
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.1,
    do_sample=False
)

result = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(result)

📋 Output Schema

The model extracts the following fields:

{
  "post_id": int,
  "post_tone": "neutral|positive|negative|exclusive|speculative",
  "post_keywords": ["keyword1", "keyword2", ...],
  "post_summary": "One-sentence summary",
  "post_content_focus": ["transfers|injury|match_summary|..."],
  "source_journalist": "David Ornstein|Fabrizio Romano|...",
  "post_style": "exclusive_news_alert|direct_quote|...",
  "post_entities": [
    {
      "entity_value": "Manchester United",
      "entity_type": "club"
    }
  ],
  "has_emoji": true|false,
  "emojis_found": ["🚨", ...],
  "has_hashtag": true|false,
  "hashtags_found": ["#MUFC", ...],
  "has_mention_tag": true|false,
  "mentions_found": ["@FabrizioRomano", ...],
  "injury_details": {
    "player_name": "Mohamed Salah",
    "status": "out_for_3_weeks",
    "injury_type": "hamstring"
  }
}

✅ Performance Metrics

Test Results (5 diverse examples)

JSON validity: 100% (5/5)
Entity extraction: 100% accuracy
Focus detection: 100% accuracy
Tone analysis: 100% accuracy

Tested Scenarios

✅ Transfers with emojis and mentions
✅ Injury updates
✅ Match reports with statistics
✅ Direct journalist quotes
✅ Simple official announcements

🎯 Intended Use

Primary Use Cases

Automated sports news analysis
Football transfer tracking systems
Injury database maintenance
Match statistics extraction
Social media monitoring

Out-of-Scope Use

Non-football content
Real-time critical decisions
Medical diagnosis (for injury data)

⚠️ Limitations

Trained on English football news only
May hallucinate rare player/club names
Best performance on news similar to training data
Requires structured prompting for optimal results

📚 Citation

If you use this model, please cite:

@misc{gemma3-football-extractor,
  title={Gemma-3-270M Football Data Extractor},
  author={Your Name},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/YOUR_USERNAME/gemma-3-270m-football-extractor}
}

📄 License

Apache 2.0 (inherited from base model)

🙏 Acknowledgments

Base model: Google's Gemma-3-270M-IT
Fine-tuning framework: LLaMA-Factory
Training infrastructure: Google Colab

Model version: 1.0
Last updated: November 2025
Contact: saadkamachin72@gmail.com

Downloads last month: 63

Safetensors

Model size

0.3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Saad4web/gemma-3-270m-football-extractor

Base model

google/gemma-3-270m

Finetuned

google/gemma-3-270m-it