⚽ Gemma-3-270M Football Data Extractor

Fine-tuned model for extracting structured data from football/soccer news posts.

🎯 Model Description

This model is a fine-tuned version of google/gemma-3-270m-it specialized in extracting structured information from football news posts, including:

  • Player transfers
  • Injury reports
  • Match summaries
  • Direct quotes
  • Statistical data

πŸ“Š Training Details

Training Data

  • Dataset size: 442 training examples, 49 validation examples
  • Data format: ShareGPT chat format
  • Content types: Transfer news, injuries, match reports, quotes

Training Configuration

  • Method: LoRA (Low-Rank Adaptation)
  • LoRA rank: 16
  • LoRA alpha: 32
  • LoRA dropout: 0.1
  • Epochs: 8
  • Learning rate: 1.5e-4
  • Batch size: 2 (per device)
  • Gradient accumulation: 4
  • Weight decay: 0.01
  • Optimizer: AdamW

Training Results

  • Final train loss: 0.11
  • Final eval loss: 0.20
  • Train/eval gap: 0.09 (excellent generalization)
  • JSON validity: 100% (5/5 test cases)
  • Entity extraction accuracy: 100%

Best Checkpoint

  • Checkpoint: 400 steps (selected from 5 candidates)
  • Selection criteria: Lowest eval loss, best JSON validity

πŸš€ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "YOUR_USERNAME/gemma-3-270m-football-extractor",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/gemma-3-270m-football-extractor")

# Prepare input
post = "🚨 BREAKING: Manchester United sign Bruno Fernandes for £55m!"

messages = [
    {
        "role": "system",
        "content": "You are a data extraction API. Respond ONLY with JSON."
    },
    {
        "role": "user",
        "content": f"Extract structured data from: {post}"
    }
]

# Generate
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.1,
    do_sample=False
)

result = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(result)

πŸ“‹ Output Schema

The model extracts the following fields:

{
  "post_id": int,
  "post_tone": "neutral|positive|negative|exclusive|speculative",
  "post_keywords": ["keyword1", "keyword2", ...],
  "post_summary": "One-sentence summary",
  "post_content_focus": ["transfers|injury|match_summary|..."],
  "source_journalist": "David Ornstein|Fabrizio Romano|...",
  "post_style": "exclusive_news_alert|direct_quote|...",
  "post_entities": [
    {
      "entity_value": "Manchester United",
      "entity_type": "club"
    }
  ],
  "has_emoji": true|false,
  "emojis_found": ["🚨", ...],
  "has_hashtag": true|false,
  "hashtags_found": ["#MUFC", ...],
  "has_mention_tag": true|false,
  "mentions_found": ["@FabrizioRomano", ...],
  "injury_details": {
    "player_name": "Mohamed Salah",
    "status": "out_for_3_weeks",
    "injury_type": "hamstring"
  }
}

βœ… Performance Metrics

Test Results (5 diverse examples)

  • JSON validity: 100% (5/5)
  • Entity extraction: 100% accuracy
  • Focus detection: 100% accuracy
  • Tone analysis: 100% accuracy

Tested Scenarios

  • βœ… Transfers with emojis and mentions
  • βœ… Injury updates
  • βœ… Match reports with statistics
  • βœ… Direct journalist quotes
  • βœ… Simple official announcements

🎯 Intended Use

Primary Use Cases

  • Automated sports news analysis
  • Football transfer tracking systems
  • Injury database maintenance
  • Match statistics extraction
  • Social media monitoring

Out-of-Scope Use

  • Non-football content
  • Real-time critical decisions
  • Medical diagnosis (for injury data)

⚠️ Limitations

  • Trained on English football news only
  • May hallucinate rare player/club names
  • Best performance on news similar to training data
  • Requires structured prompting for optimal results

πŸ“š Citation

If you use this model, please cite:

@misc{gemma3-football-extractor,
  title={Gemma-3-270M Football Data Extractor},
  author={Your Name},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/YOUR_USERNAME/gemma-3-270m-football-extractor}
}

πŸ“„ License

Apache 2.0 (inherited from base model)

πŸ™ Acknowledgments

  • Base model: Google's Gemma-3-270M-IT
  • Fine-tuning framework: LLaMA-Factory
  • Training infrastructure: Google Colab

Model version: 1.0
Last updated: November 2025
Contact: saadkamachin72@gmail.com

Downloads last month
63
Safetensors
Model size
0.3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Saad4web/gemma-3-270m-football-extractor

Finetuned
(834)
this model

Space using Saad4web/gemma-3-270m-football-extractor 1