Product Information Extractor (Fine-tuned Gemma 1B IT)
This model is a fine-tuned version of unsloth/gemma-3-1b-it, optimized for extracting structured product information from noisy or unstructured descriptions.
It outputs a valid JSON object containing:
brandproduct(2–3 words)keywords(5 core keywords without quantity info)quantity(normalized to number + unit)
Built as part of feature engineering for the Amazon ML Challenge 2025, it converts messy e-commerce listings into clean, ML-ready records.
⚡ Highlights
- 🧩 Base model:
unsloth/gemma-3-1b-it - ⚙️ Fine-tuning: LoRA + SFT using PEFT and TRL
- 🚀 VRAM usage: < 2 GB (VRAM-efficient)
- ⚡ Speed: Very fast inference
- 🧠 Output: Strict, deterministic JSON schema
- 🛍 Use case: Product data cleaning & feature extraction
🧩 Example
Input
[
{
"role": "system",
"content": "You are given a task of extracting information from the product description.\n\nYou must always return this as a valid JSON that follows this structure.\n\n{\n \"brand\": \"brand_name\",\n \"product\": \"specifying what product is that (2-3 words max)\",\n \"keywords\": [\"keyword1\", \"keyword2\", \"keyword3\", \"keyword4\", \"keyword5\"] (Keywords must not include any details regarding quantity or packs or amount),\n \"quantity\": \"specify the quantity mentioned {format : 'Number' 'any of the allowed quantity'}\"\n}\n\nAllowed units for quantity: ['Gram', 'Kilogram', 'Ounce', 'Pound', 'Fluid Ounce', 'Milliliter', 'Liter', 'Gallon', 'Count', 'Capsule', 'Tea Bag', 'Bag', 'Bottle', 'Box', 'Bucket', 'Can', 'Carton', 'Case', 'Container', 'Jar', 'Kit', 'Pack', 'Packet', 'Pouch', 'Sachet', 'Tin', 'Centimeter', 'Foot', 'Inch', 'Meter', 'Square Foot', 'Na']\n\nRules:\n- Return ONLY valid JSON, nothing else (no explanations or text before/after).\n- Extract 5 keywords that define the product.\n- Keep all text lowercase, without punctuation.\n\nDescription:"
},
{
"role": "user",
"content": "Item Name: Rani Garam Masala Indian 11 Whole Spices Blend 28oz (800g) ~ All Natural, Salt-Free | Vegan | No Colors | Gluten Friendly | NON-GMO | Kosher | Indian Origin. Bullet Point 1: You'll LOVE our Garam Masala Whole Spice by Rani Brand--Here's Why: ❤️Now KOSHER! 100% Natural, Non-GMO, No Preservatives, Vegan, Gluten Friendly PREMIUM Gourmet Food Grade Spice. ❤️Packed in a no barrier Plastic Jar, let us tell you how important that is when using Indian Spices! Rani is a USA based company selling spices for over 40 years, buy with confidence! ❤️NO FILLERS in any Rani Brand Spices (fillers are commonly used in spices to make them free flowing or lessen the costs of production) usually sodium or like product. Net Wt. 10oz (283g), Product of India ❤️ The whole version of our the Highest rated Garam Masala on Amazon! Product Description: Rani Garam Masala Traditionally from Northern India, garam masala is a staple spice in Indian cookery. Garam, when translated means 'warm' best describes the properties of the blend. Warming ingredients such as cinnamon, mace and ginger give garam masala a mellow, appealing aroma. It is an essential ingredient when making sauces for meat and poultry dishes, as well as a great companion to vegetables and lentils recipes. Garam masala is often used towards the end of the cooking process as an added seasoning and it is often sprinkled over finished dishes to enhance the flavor. Product Type: Masala Whole Spices Blend. Packaging: Plastic Bag. Product of India. Rani's Spices are all natural, fresh & premium quality spices. Value: 28.0 Unit: Ounce."
}
]
Output (before fine-tuning)
{
"brand": "Rani",
"product": "Garam Masala Whole Spices Blend",
"keywords": ["rani", "garam masala", "whole spices", "28oz", "non gmo", "kosher", "vegan", "gluten friendly"],
"quantity": "28oz"
}
Output (after fine-tuning)
{
"brand": "rani",
"product": "garam masala blend",
"keywords": ["indian", "spice", "masala", "vegan", "kosher"],
"quantity": "28 ounce"
}
✅ Fine-tuned version understands semantic intent, removes redundancy, and normalizes units.
🚀 Quick Start
from transformers import pipeline
extractor = pipeline(
"text-generation",
model="Dinesh-Kumar/gemma3-1b-finetuned-v3",
device="cuda"
)
description = """Item Name: Rani Garam Masala Indian 11 Whole Spices Blend 28oz (800g) ~ All Natural, Salt-Free | Vegan | No Colors | Gluten Friendly | NON-GMO | Kosher | Indian Origin\r\nBullet Point 1: You'll LOVE our Garam Masala Whole Spice by Rani Brand--Here's Why:\r\nBullet Point 2: ❤️Now KOSHER! 100% Natural, Non-GMO, No Preservatives, Vegan, Gluten Friendly PREMIUM Gourmet Food Grade Spice.\r\nBullet Point 3: ❤️Packed in a no barrier Plastic Jar, let us tell you how important that is when using Indian Spices! Rani is a USA based company selling spices for over 40 years, buy with confidence!\r\nBullet Point 4: ❤️NO FILLERS in any Rani Brand Spices (fillers are commonly used in spices to make them free flowing or lessen the costs of production) usually sodium or like product. Net Wt. 10oz (283g), Product of India\r\nBullet Point 5: ❤️ The whole version of our the Highest rated Garam Masala on Amazon!\r\nProduct Description: <p><b>Rani Garam Masala</b> Traditionally from Northern Indian, garam masala is a staple spice in Indian cookery. Garam, when translated means “warm” best describes the properties of the blend. Warming ingredients such as cinnamon, mace and ginger give garam masala a mellow, appealing aroma. It is an essential ingredient when making sauces for meat and poultry dishes, as well as a great companion to vegetables and lentils recipes. Garam masala is often used towards the end of the cooking process as an added seasoning and it is often sprinkled over finished dishes to enhance the flavor.</p> Product Type: Masala Whole Spices Blend <br> Packaging: Plastic Bag<br> Product of India <br><br>Did you know most products sold in the U.S. go through 3-7 different supply chains before they are sold to you? This means much more cost & product that is older by the time it reaches you, the end consumer. At Rani (Rani's World Foods) we are the manufacturer, distributor & retailer of Rani Brand products. We are vertically integrated, means only 1 supply chain for best pricing & freshness, makes Rani & Rani Brand products one of kind here on Amazon. <br><br> Rani's Spices are all natural, fresh & premium quality spices... taste the difference in all your dishes. At Rani Foods, we comb the planet for the best ingredients, rather it be high quality coriander from Orissa, or the best Cumin from Gujrat. We go throughout the world purchasing the finest quality spices available, and fully guarantee our products. We are a 100% Family Owned and have been in the Indian food trade for over 50 years. Our goal is your happiness and health. Give our products a try, and experience the difference.\r\nValue: 28.0\r\nUnit: Ounce\r\n"}, {"role": "assistant", "content": "{\"keywords\": [\"rani\", \"garam masala\", \"whole spices\", \"28oz\", \"non gmo\", \"kosher\", \"vegan\", \"gluten friendly\"]}"}]}"""
output = extractor([
{"role": "system", "content": "Extract product info and return JSON."},
{"role": "user", "content": description}
], max_new_tokens=256, return_full_text=False)
print(output[0]["generated_text"])
🧠 Training Details
| Setting | Value |
|---|---|
| Base Model | unsloth/gemma-3-1b-it |
| Fine-tuning | LoRA + SFT |
| Frameworks | PEFT, TRL, Transformers |
| Dataset | Derived from Amazon ML Challenge 2025 |
| Objective | Structured JSON extraction |
| Epochs | 3 |
| Learning Rate | 2e-4 |
| LoRA Rank | 16 |
| Target Modules | q_proj, v_proj |
| Precision | bfloat16 |
🧾 JSON Schema
{
"brand": "brand_name",
"product": "short product name",
"keywords": ["keyword1", "keyword2", "keyword3", "keyword4", "keyword5"],
"quantity": "number unit"
}
Allowed units:
['Gram', 'Kilogram', 'Ounce', 'Pound', 'Fluid Ounce', 'Milliliter', 'Liter', 'Gallon', 'Count', 'Capsule', 'Tea Bag', 'Bag', 'Bottle', 'Box', 'Bucket', 'Can', 'Carton', 'Case', 'Container', 'Jar', 'Kit', 'Pack', 'Packet', 'Pouch', 'Sachet', 'Tin', 'Centimeter', 'Foot', 'Inch', 'Meter', 'Square Foot', 'Na']
💡 Intended Use
- E-commerce product metadata extraction
- Feature engineering for ML models
- Product categorization and deduplication
- Lightweight inference environments
🧰 Framework Versions
- PEFT: 0.17.1
- TRL: 0.23.0
- Transformers: 4.56.2
- PyTorch: 2.8.0 + cu126
- Datasets: 4.2.0
- Tokenizers: 0.22.1
📚 Citation
If you use this model, please cite:
@misc{dinesh2025gemma1bextractor,
title = {Product Information Extractor (Fine-tuned Gemma 1B IT)},
author = {Dinesh Kumar},
year = {2025},
howpublished = {\url{https://huggingface.co/Dinesh-Kumar/gemma3-1b-finetuned-v3}},
note = {Fine-tuned for Amazon ML Challenge 2025}
}
Also cite TRL:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
🏁 Summary
A lightweight, low-VRAM, fine-tuned Gemma 1B IT model capable of producing clean, structured product data directly from raw descriptions — purpose-built for the Amazon ML Challenge 2025 feature-engineering pipeline.
- Downloads last month
- 33