β½ Gemma-3-270M Football Data Extractor
Fine-tuned model for extracting structured data from football/soccer news posts.
π― Model Description
This model is a fine-tuned version of google/gemma-3-270m-it specialized in extracting structured information from football news posts, including:
- Player transfers
- Injury reports
- Match summaries
- Direct quotes
- Statistical data
π Training Details
Training Data
- Dataset size: 442 training examples, 49 validation examples
- Data format: ShareGPT chat format
- Content types: Transfer news, injuries, match reports, quotes
Training Configuration
- Method: LoRA (Low-Rank Adaptation)
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.1
- Epochs: 8
- Learning rate: 1.5e-4
- Batch size: 2 (per device)
- Gradient accumulation: 4
- Weight decay: 0.01
- Optimizer: AdamW
Training Results
- Final train loss: 0.11
- Final eval loss: 0.20
- Train/eval gap: 0.09 (excellent generalization)
- JSON validity: 100% (5/5 test cases)
- Entity extraction accuracy: 100%
Best Checkpoint
- Checkpoint: 400 steps (selected from 5 candidates)
- Selection criteria: Lowest eval loss, best JSON validity
π Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"YOUR_USERNAME/gemma-3-270m-football-extractor",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/gemma-3-270m-football-extractor")
# Prepare input
post = "π¨ BREAKING: Manchester United sign Bruno Fernandes for Β£55m!"
messages = [
{
"role": "system",
"content": "You are a data extraction API. Respond ONLY with JSON."
},
{
"role": "user",
"content": f"Extract structured data from: {post}"
}
]
# Generate
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=1024,
temperature=0.1,
do_sample=False
)
result = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(result)
π Output Schema
The model extracts the following fields:
{
"post_id": int,
"post_tone": "neutral|positive|negative|exclusive|speculative",
"post_keywords": ["keyword1", "keyword2", ...],
"post_summary": "One-sentence summary",
"post_content_focus": ["transfers|injury|match_summary|..."],
"source_journalist": "David Ornstein|Fabrizio Romano|...",
"post_style": "exclusive_news_alert|direct_quote|...",
"post_entities": [
{
"entity_value": "Manchester United",
"entity_type": "club"
}
],
"has_emoji": true|false,
"emojis_found": ["π¨", ...],
"has_hashtag": true|false,
"hashtags_found": ["#MUFC", ...],
"has_mention_tag": true|false,
"mentions_found": ["@FabrizioRomano", ...],
"injury_details": {
"player_name": "Mohamed Salah",
"status": "out_for_3_weeks",
"injury_type": "hamstring"
}
}
β Performance Metrics
Test Results (5 diverse examples)
- JSON validity: 100% (5/5)
- Entity extraction: 100% accuracy
- Focus detection: 100% accuracy
- Tone analysis: 100% accuracy
Tested Scenarios
- β Transfers with emojis and mentions
- β Injury updates
- β Match reports with statistics
- β Direct journalist quotes
- β Simple official announcements
π― Intended Use
Primary Use Cases
- Automated sports news analysis
- Football transfer tracking systems
- Injury database maintenance
- Match statistics extraction
- Social media monitoring
Out-of-Scope Use
- Non-football content
- Real-time critical decisions
- Medical diagnosis (for injury data)
β οΈ Limitations
- Trained on English football news only
- May hallucinate rare player/club names
- Best performance on news similar to training data
- Requires structured prompting for optimal results
π Citation
If you use this model, please cite:
@misc{gemma3-football-extractor,
title={Gemma-3-270M Football Data Extractor},
author={Your Name},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/YOUR_USERNAME/gemma-3-270m-football-extractor}
}
π License
Apache 2.0 (inherited from base model)
π Acknowledgments
- Base model: Google's Gemma-3-270M-IT
- Fine-tuning framework: LLaMA-Factory
- Training infrastructure: Google Colab
Model version: 1.0
Last updated: November 2025
Contact: saadkamachin72@gmail.com
- Downloads last month
- 63
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support