--- language: en license: apache-2.0 tags: - football - soccer - data-extraction - gemma - structured-output - json base_model: google/gemma-3-270m-it datasets: - custom-football-news metrics: - accuracy --- # ⚽ Gemma-3-270M Football Data Extractor Fine-tuned model for extracting structured data from football/soccer news posts. ## 🎯 Model Description This model is a fine-tuned version of [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it) specialized in extracting structured information from football news posts, including: - Player transfers - Injury reports - Match summaries - Direct quotes - Statistical data ## 📊 Training Details ### Training Data - **Dataset size**: 442 training examples, 49 validation examples - **Data format**: ShareGPT chat format - **Content types**: Transfer news, injuries, match reports, quotes ### Training Configuration - **Method**: LoRA (Low-Rank Adaptation) - **LoRA rank**: 16 - **LoRA alpha**: 32 - **LoRA dropout**: 0.1 - **Epochs**: 8 - **Learning rate**: 1.5e-4 - **Batch size**: 2 (per device) - **Gradient accumulation**: 4 - **Weight decay**: 0.01 - **Optimizer**: AdamW ### Training Results - **Final train loss**: 0.11 - **Final eval loss**: 0.20 - **Train/eval gap**: 0.09 (excellent generalization) - **JSON validity**: 100% (5/5 test cases) - **Entity extraction accuracy**: 100% ### Best Checkpoint - **Checkpoint**: 400 steps (selected from 5 candidates) - **Selection criteria**: Lowest eval loss, best JSON validity ## 🚀 Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( "YOUR_USERNAME/gemma-3-270m-football-extractor", torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/gemma-3-270m-football-extractor") # Prepare input post = "🚨 BREAKING: Manchester United sign Bruno Fernandes for £55m!" messages = [ { "role": "system", "content": "You are a data extraction API. Respond ONLY with JSON." }, { "role": "user", "content": f"Extract structured data from: {post}" } ] # Generate inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True) inputs = inputs.to(model.device) outputs = model.generate( inputs, max_new_tokens=1024, temperature=0.1, do_sample=False ) result = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True) print(result) ``` ## 📋 Output Schema The model extracts the following fields: ```json { "post_id": int, "post_tone": "neutral|positive|negative|exclusive|speculative", "post_keywords": ["keyword1", "keyword2", ...], "post_summary": "One-sentence summary", "post_content_focus": ["transfers|injury|match_summary|..."], "source_journalist": "David Ornstein|Fabrizio Romano|...", "post_style": "exclusive_news_alert|direct_quote|...", "post_entities": [ { "entity_value": "Manchester United", "entity_type": "club" } ], "has_emoji": true|false, "emojis_found": ["🚨", ...], "has_hashtag": true|false, "hashtags_found": ["#MUFC", ...], "has_mention_tag": true|false, "mentions_found": ["@FabrizioRomano", ...], "injury_details": { "player_name": "Mohamed Salah", "status": "out_for_3_weeks", "injury_type": "hamstring" } } ``` ## ✅ Performance Metrics ### Test Results (5 diverse examples) - **JSON validity**: 100% (5/5) - **Entity extraction**: 100% accuracy - **Focus detection**: 100% accuracy - **Tone analysis**: 100% accuracy ### Tested Scenarios - ✅ Transfers with emojis and mentions - ✅ Injury updates - ✅ Match reports with statistics - ✅ Direct journalist quotes - ✅ Simple official announcements ## 🎯 Intended Use ### Primary Use Cases - Automated sports news analysis - Football transfer tracking systems - Injury database maintenance - Match statistics extraction - Social media monitoring ### Out-of-Scope Use - Non-football content - Real-time critical decisions - Medical diagnosis (for injury data) ## ⚠️ Limitations - Trained on English football news only - May hallucinate rare player/club names - Best performance on news similar to training data - Requires structured prompting for optimal results ## 📚 Citation If you use this model, please cite: ```bibtex @misc{gemma3-football-extractor, title={Gemma-3-270M Football Data Extractor}, author={Your Name}, year={2025}, publisher={HuggingFace}, url={https://huggingface.co/YOUR_USERNAME/gemma-3-270m-football-extractor} } ``` ## 📄 License Apache 2.0 (inherited from base model) ## 🙏 Acknowledgments - Base model: Google's Gemma-3-270M-IT - Fine-tuning framework: LLaMA-Factory - Training infrastructure: Google Colab --- **Model version**: 1.0 **Last updated**: November 2025 **Contact**: saadkamachin72@gmail.com