---
language: en
license: apache-2.0
tags:
- football
- soccer
- data-extraction
- gemma
- structured-output
- json
base_model: google/gemma-3-270m-it
datasets:
- custom-football-news
metrics:
- accuracy
---

# ⚽ Gemma-3-270M Football Data Extractor

Fine-tuned model for extracting structured data from football/soccer news posts.

## 🎯 Model Description

This model is a fine-tuned version of [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it) 
specialized in extracting structured information from football news posts, including:

- Player transfers
- Injury reports
- Match summaries
- Direct quotes
- Statistical data

## 📊 Training Details

### Training Data
- **Dataset size**: 442 training examples, 49 validation examples
- **Data format**: ShareGPT chat format
- **Content types**: Transfer news, injuries, match reports, quotes

### Training Configuration
- **Method**: LoRA (Low-Rank Adaptation)
- **LoRA rank**: 16
- **LoRA alpha**: 32
- **LoRA dropout**: 0.1
- **Epochs**: 8
- **Learning rate**: 1.5e-4
- **Batch size**: 2 (per device)
- **Gradient accumulation**: 4
- **Weight decay**: 0.01
- **Optimizer**: AdamW

### Training Results
- **Final train loss**: 0.11
- **Final eval loss**: 0.20
- **Train/eval gap**: 0.09 (excellent generalization)
- **JSON validity**: 100% (5/5 test cases)
- **Entity extraction accuracy**: 100%

### Best Checkpoint
- **Checkpoint**: 400 steps (selected from 5 candidates)
- **Selection criteria**: Lowest eval loss, best JSON validity

## 🚀 Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "YOUR_USERNAME/gemma-3-270m-football-extractor",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/gemma-3-270m-football-extractor")

# Prepare input
post = "🚨 BREAKING: Manchester United sign Bruno Fernandes for £55m!"

messages = [
    {
        "role": "system",
        "content": "You are a data extraction API. Respond ONLY with JSON."
    },
    {
        "role": "user",
        "content": f"Extract structured data from: {post}"
    }
]

# Generate
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.1,
    do_sample=False
)

result = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(result)
```

## 📋 Output Schema

The model extracts the following fields:

```json
{
  "post_id": int,
  "post_tone": "neutral|positive|negative|exclusive|speculative",
  "post_keywords": ["keyword1", "keyword2", ...],
  "post_summary": "One-sentence summary",
  "post_content_focus": ["transfers|injury|match_summary|..."],
  "source_journalist": "David Ornstein|Fabrizio Romano|...",
  "post_style": "exclusive_news_alert|direct_quote|...",
  "post_entities": [
    {
      "entity_value": "Manchester United",
      "entity_type": "club"
    }
  ],
  "has_emoji": true|false,
  "emojis_found": ["🚨", ...],
  "has_hashtag": true|false,
  "hashtags_found": ["#MUFC", ...],
  "has_mention_tag": true|false,
  "mentions_found": ["@FabrizioRomano", ...],
  "injury_details": {
    "player_name": "Mohamed Salah",
    "status": "out_for_3_weeks",
    "injury_type": "hamstring"
  }
}
```

## ✅ Performance Metrics

### Test Results (5 diverse examples)
- **JSON validity**: 100% (5/5)
- **Entity extraction**: 100% accuracy
- **Focus detection**: 100% accuracy
- **Tone analysis**: 100% accuracy

### Tested Scenarios
- ✅ Transfers with emojis and mentions
- ✅ Injury updates
- ✅ Match reports with statistics
- ✅ Direct journalist quotes
- ✅ Simple official announcements

## 🎯 Intended Use

### Primary Use Cases
- Automated sports news analysis
- Football transfer tracking systems
- Injury database maintenance
- Match statistics extraction
- Social media monitoring

### Out-of-Scope Use
- Non-football content
- Real-time critical decisions
- Medical diagnosis (for injury data)

## ⚠️ Limitations

- Trained on English football news only
- May hallucinate rare player/club names
- Best performance on news similar to training data
- Requires structured prompting for optimal results

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{gemma3-football-extractor,
  title={Gemma-3-270M Football Data Extractor},
  author={Your Name},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/YOUR_USERNAME/gemma-3-270m-football-extractor}
}
```

## 📄 License

Apache 2.0 (inherited from base model)

## 🙏 Acknowledgments

- Base model: Google's Gemma-3-270M-IT
- Fine-tuning framework: LLaMA-Factory
- Training infrastructure: Google Colab

---

**Model version**: 1.0  
**Last updated**: November 2025  
**Contact**: saadkamachin72@gmail.com