aliRafik
/

invoices-donut-finetuned-Lora-merged

@@ -16,25 +16,269 @@ This makes it **production-ready**: no need to separately load base + adapters.
 ---
-### Intended Use
-- Direct deployment without PEFT
-- Invoice extraction in pipelines expecting a standalone Hugging Face model
----
-### Training Details
-- **Base model**: `numind/NuExtract-2.0-4B`
-- **Method**: LoRA
-- **LoRA Config**:
-  - Rank: 8
-  - Alpha: 32
-  - Dropout: 0.1
-  - Target modules: `q_proj`, `v_proj`
-- **Epochs**: 10
-- **Batch size**: 2
-- **Learning rate**: 1e-5
-- **Precision**: bfloat16
-- **Gradient checkpointing**: Enabled
----

 ---
+## Intended Use
+- Extracting structured JSON fields from invoice images:
+  - Invoice number, date
+  - Seller/client details
+  - Tax IDs, IBAN
+  - Item descriptions, prices, VAT
+  - Totals (net, VAT, gross)
+- Not intended for general document OCR outside invoices.
+## Training Details
+- **Base model**: Qwen/Qwen2.5-VL-3B-Instruct
+- **Framework**: Hugging Face TRL (SFTTrainer) with PEFT/LoRA
+- **LoRA config**:
+  - ***Rank (r)***: 8
+  - ***Alpha***: 32
+  - ***Target modules***: q_proj, v_proj
+  - ***Dropout***: 0.1
+- **Epochs**: 10
+- **Batch size**: 2
+- **Learning rate**: 1e-5
+- **Precision**: bfloat16
+- **Gradient accumulation**: 4
+- **Scheduler**: Constant LR
+- **Max sequence length**: 1024
+- **Gradient checkpointing**: Enabled
+- **Trainable parameters**: ~1.8M (0.05% of 3.75B total)
+## Usage
+### Installation
+```bash
+pip install transformers torch datasets pillow
+```
+### Load Model and Processor
+```python
+import torch
+from transformers import AutoProcessor, AutoModelForVision2Seq
+model_name = "aliRafik/invoices-donut-finetuned-Lora-merged"
+model = AutoModelForVision2Seq.from_pretrained(
+    model_name,
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16,  # Optional: Use float32 if bfloat16 causes issues
+    attn_implementation="flash_attention_2",  # Requires Ampere+ GPU & torch >= 2.0
+    device_map="auto"
+)
+processor = AutoProcessor.from_pretrained(
+    model_name,
+    trust_remote_code=True,
+    padding_side='left',
+    use_fast=True
+)
+```
+### Define Extraction Template
+```python
+template = """
+{
+  "header": {
+    "invoice_no": "string",
+    "invoice_date": "date-time",
+    "seller": "string",
+    "client": "string",
+    "seller_tax_id": "string",
+    "client_tax_id": "string",
+    "iban": "string"
+  },
+  "items": [
+    {
+      "item_desc": "string",
+      "item_qty": "number",
+      "item_net_price": "number",
+      "item_net_worth": "number",
+      "item_vat": "number",
+      "item_gross_worth": "number"
+    }
+  ],
+  "summary": {
+    "total_net_worth": "number",
+    "total_vat": "number",
+    "total_gross_worth": "number"
+  }
+}
+"""
+```
+### Test on Sample from Dataset
+```python
+from datasets import load_dataset
+import json
+from qwen_vl_utils import process_vision_info
+# Load the dataset
+dataset = load_dataset("katanaml-org/invoices-donut-data-v1")
+# Select a sample (e.g., index 0)
+sample = dataset['train'][0]
+image = sample['image']
+ground_truth = sample['ground_truth']
+print(json.loads(ground_truth))
+# Prepare message
+messages = [
+    {"role": "user", "content": [{"type": "image", "image": image}]}
+]
+# Process vision info
+image_inputs, _ = process_vision_info(messages)
+# Apply chat template
+text = processor.tokenizer.apply_chat_template(
+    messages,
+    template=template,
+    tokenize=False,
+    add_generation_prompt=True
+)
+# Prepare inputs
+inputs = processor(
+    text=[text],
+    images=image_inputs,
+    padding=True,
+    return_tensors="pt"
+).to(model.device)
+# Generation config
+generation_config = {
+    "do_sample": False,
+    "num_beams": 1,
+    "max_new_tokens": 2048
+}
+# Generate
+generated_ids = model.generate(**inputs, **generation_config)
+generated_ids_trimmed = [
+    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(
+    generated_ids_trimmed,
+    skip_special_tokens=True,
+    clean_up_tokenization_spaces=False
+)
+# Parse and print
+try:
+    extracted_data = json.loads(output_text[0])
+    print("Extracted Data:", extracted_data)
+except json.JSONDecodeError:
+    print("Raw Output:", output_text[0])
+# Compare with ground truth
+gt_parsed = json.loads(ground_truth)['gt_parse']
+print("Ground Truth:", gt_parsed)
+```
+### Test on Unseen Data (Custom Image)
+```python
+from PIL import Image
+from io import BytesIO
+import requests
+# Load from local path
+image_path = "/content/image.jpg"  # Replace with your path
+image = Image.open(image_path)
+# Or load from URL
+# image_url = "https://example.com/your_invoice.jpg"
+# response = requests.get(image_url)
+# image = Image.open(BytesIO(response.content))
+# Use same inference code as above
+```
+## Example Results
+#### Input Image:
+![Invoice Extraction Example](https://th.bing.com/th/id/OIP.u5Uh7wUsLTy4zqUMOWuT-QHaJl?w=186&h=242&c=7&r=0&o=5&pid=1.7)
+#### Extracted Data:
+```python
+{
+  "header": {
+    "invoice_no": "49565075",
+    "invoice_date": "2019-10-28",
+    "seller": "Kane-Morgan 968 Carr Mission Apt. 320 Bernardville, VA 28211",
+    "client": "Garcia Inc 445 Haas Viaduct Suite 454 Michaelhaven, LA 32852",
+    "seller_tax_id": "964-95-3813",
+    "client_tax_id": "909-75-5482",
+    "iban": "GB73WCJ55232646970614"
+  },
+  "items": [
+    {
+      "item_desc": "Anthropologie Gold Elegant Swan Decorative Metal Bottle Stopper Wine Saver",
+      "item_qty": 3.0,
+      "item_net_price": 19.98,
+      "item_net_worth": 59.94,
+      "item_vat": 10.0,
+      "item_gross_worth": 65.93
+    },
+    {
+      "item_desc": "Lolita Happy Retirement Wine Glass 15 Ounce GLS11-5534H",
+      "item_qty": 1.0,
+      "item_net_price": 8.0,
+      "item_net_worth": 8.0,
+      "item_vat": 10.0,
+      "item_gross_worth": 8.8
+    },
+    {
+      "item_desc": "Lolita \"Congratulations\" Hand Painted and Decorated Wine Glass NIB",
+      "item_qty": 1.0,
+      "item_net_price": 20.0,
+      "item_net_worth": 20.0,
+      "item_vat": 10.0,
+      "item_gross_worth": 22.0
+    }
+  ],
+  "summary": {
+    "total_net_worth": 87.94,
+    "total_vat": 8.79,
+    "total_gross_worth": 96.73
+  }
+}
+```
+## License
+#### Apache-2.0
+tags:
+######  vision
+###### document-understanding
+######  invoice-processing
+######  donut
+###### qwen
+## Citations
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```