--- base_model: numind/NuExtract-2.0-4B library_name: transformers model_name: invoices-donut-finetuned-Lora-merged tags: - generated_from_trainer - sft - trl licence: license --- ### Overview `invoices-donut-merged` is the **LoRA adapter merged back into the base weights** of [`numind/NuExtract-2.0-4B`](https://huggingface.co/numind/NuExtract-2.0-4B). It behaves like a fully fine-tuned model but trained using efficient LoRA adapters. This makes it **production-ready**: no need to separately load base + adapters. --- ## Intended Use - Extracting structured JSON fields from invoice images: - Invoice number, date - Seller/client details - Tax IDs, IBAN - Item descriptions, prices, VAT - Totals (net, VAT, gross) - Not intended for general document OCR outside invoices. ## Training Details - **Base model**: Qwen/Qwen2.5-VL-3B-Instruct - **Framework**: Hugging Face TRL (SFTTrainer) with PEFT/LoRA - **LoRA config**: - ***Rank (r)***: 8 - ***Alpha***: 32 - ***Target modules***: q_proj, v_proj - ***Dropout***: 0.1 - **Epochs**: 10 - **Batch size**: 2 - **Learning rate**: 1e-5 - **Precision**: bfloat16 - **Gradient accumulation**: 4 - **Scheduler**: Constant LR - **Max sequence length**: 1024 - **Gradient checkpointing**: Enabled - **Trainable parameters**: ~1.8M (0.05% of 3.75B total) ## Usage ### Installation ```bash pip install transformers torch datasets pillow ``` ### Load Model and Processor ```python import torch from transformers import AutoProcessor, AutoModelForVision2Seq model_name = "aliRafik/invoices-donut-finetuned-Lora-merged" model = AutoModelForVision2Seq.from_pretrained( model_name, trust_remote_code=True, torch_dtype=torch.bfloat16, # Optional: Use float32 if bfloat16 causes issues attn_implementation="flash_attention_2", # Requires Ampere+ GPU & torch >= 2.0 device_map="auto" ) processor = AutoProcessor.from_pretrained( model_name, trust_remote_code=True, padding_side='left', use_fast=True ) ``` ### Define Extraction Template ```python template = """ { "header": { "invoice_no": "string", "invoice_date": "date-time", "seller": "string", "client": "string", "seller_tax_id": "string", "client_tax_id": "string", "iban": "string" }, "items": [ { "item_desc": "string", "item_qty": "number", "item_net_price": "number", "item_net_worth": "number", "item_vat": "number", "item_gross_worth": "number" } ], "summary": { "total_net_worth": "number", "total_vat": "number", "total_gross_worth": "number" } } """ ``` ### Test on Sample from Dataset ```python from datasets import load_dataset import json from qwen_vl_utils import process_vision_info # Load the dataset dataset = load_dataset("katanaml-org/invoices-donut-data-v1") # Select a sample (e.g., index 0) sample = dataset['train'][0] image = sample['image'] ground_truth = sample['ground_truth'] print(json.loads(ground_truth)) # Prepare message messages = [ {"role": "user", "content": [{"type": "image", "image": image}]} ] # Process vision info image_inputs, _ = process_vision_info(messages) # Apply chat template text = processor.tokenizer.apply_chat_template( messages, template=template, tokenize=False, add_generation_prompt=True ) # Prepare inputs inputs = processor( text=[text], images=image_inputs, padding=True, return_tensors="pt" ).to(model.device) # Generation config generation_config = { "do_sample": False, "num_beams": 1, "max_new_tokens": 2048 } # Generate generated_ids = model.generate(**inputs, **generation_config) generated_ids_trimmed = [ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) # Parse and print try: extracted_data = json.loads(output_text[0]) print("Extracted Data:", extracted_data) except json.JSONDecodeError: print("Raw Output:", output_text[0]) # Compare with ground truth gt_parsed = json.loads(ground_truth)['gt_parse'] print("Ground Truth:", gt_parsed) ``` ### Test on Unseen Data (Custom Image) ```python from PIL import Image from io import BytesIO import requests # Load from local path image_path = "/content/image.jpg" # Replace with your path image = Image.open(image_path) # Or load from URL # image_url = "https://example.com/your_invoice.jpg" # response = requests.get(image_url) # image = Image.open(BytesIO(response.content)) # Use same inference code as above ``` ## Example Results #### Input Image: ![Invoice Extraction Example](https://th.bing.com/th/id/OIP.u5Uh7wUsLTy4zqUMOWuT-QHaJl?w=186&h=242&c=7&r=0&o=5&pid=1.7) #### Extracted Data: ```python { "header": { "invoice_no": "49565075", "invoice_date": "2019-10-28", "seller": "Kane-Morgan 968 Carr Mission Apt. 320 Bernardville, VA 28211", "client": "Garcia Inc 445 Haas Viaduct Suite 454 Michaelhaven, LA 32852", "seller_tax_id": "964-95-3813", "client_tax_id": "909-75-5482", "iban": "GB73WCJ55232646970614" }, "items": [ { "item_desc": "Anthropologie Gold Elegant Swan Decorative Metal Bottle Stopper Wine Saver", "item_qty": 3.0, "item_net_price": 19.98, "item_net_worth": 59.94, "item_vat": 10.0, "item_gross_worth": 65.93 }, { "item_desc": "Lolita Happy Retirement Wine Glass 15 Ounce GLS11-5534H", "item_qty": 1.0, "item_net_price": 8.0, "item_net_worth": 8.0, "item_vat": 10.0, "item_gross_worth": 8.8 }, { "item_desc": "Lolita \"Congratulations\" Hand Painted and Decorated Wine Glass NIB", "item_qty": 1.0, "item_net_price": 20.0, "item_net_worth": 20.0, "item_vat": 10.0, "item_gross_worth": 22.0 } ], "summary": { "total_net_worth": 87.94, "total_vat": 8.79, "total_gross_worth": 96.73 } } ``` ## License #### Apache-2.0 tags: ###### vision ###### document-understanding ###### invoice-processing ###### donut ###### qwen ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```