update

33da844 verified 3 months ago

6.69 kB

	---
	base_model: numind/NuExtract-2.0-4B
	library_name: transformers
	model_name: invoices-donut-finetuned-Lora-merged
	tags:
	- generated_from_trainer
	- sft
	- trl
	licence: license
	---

	### Overview
	`invoices-donut-merged` is the LoRA adapter merged back into the base weights of [`numind/NuExtract-2.0-4B`](https://huggingface.co/numind/NuExtract-2.0-4B).
	It behaves like a fully fine-tuned model but trained using efficient LoRA adapters.
	This makes it production-ready: no need to separately load base + adapters.

	---

	## Intended Use

	- Extracting structured JSON fields from invoice images:
	- Invoice number, date
	- Seller/client details
	- Tax IDs, IBAN
	- Item descriptions, prices, VAT
	- Totals (net, VAT, gross)
	- Not intended for general document OCR outside invoices.

	## Training Details

	- Base model: Qwen/Qwen2.5-VL-3B-Instruct
	- Framework: Hugging Face TRL (SFTTrainer) with PEFT/LoRA
	- LoRA config:
	- *Rank (r)*: 8
	- *Alpha*: 32
	- *Target modules*: q_proj, v_proj
	- *Dropout*: 0.1
	- Epochs: 10
	- Batch size: 2
	- Learning rate: 1e-5
	- Precision: bfloat16
	- Gradient accumulation: 4
	- Scheduler: Constant LR
	- Max sequence length: 1024
	- Gradient checkpointing: Enabled
	- Trainable parameters: ~1.8M (0.05% of 3.75B total)


	## Usage

	### Installation

	```bash
	pip install transformers torch datasets pillow
	```

	### Load Model and Processor

	```python
	import torch
	from transformers import AutoProcessor, AutoModelForVision2Seq

	model_name = "aliRafik/invoices-donut-finetuned-Lora-merged"

	model = AutoModelForVision2Seq.from_pretrained(
	model_name,
	trust_remote_code=True,
	torch_dtype=torch.bfloat16, # Optional: Use float32 if bfloat16 causes issues
	attn_implementation="flash_attention_2", # Requires Ampere+ GPU & torch >= 2.0
	device_map="auto"
	)

	processor = AutoProcessor.from_pretrained(
	model_name,
	trust_remote_code=True,
	padding_side='left',
	use_fast=True
	)
	```


	### Define Extraction Template

	```python
	template = """
	{
	"header": {
	"invoice_no": "string",
	"invoice_date": "date-time",
	"seller": "string",
	"client": "string",
	"seller_tax_id": "string",
	"client_tax_id": "string",
	"iban": "string"
	},
	"items": [
	{
	"item_desc": "string",
	"item_qty": "number",
	"item_net_price": "number",
	"item_net_worth": "number",
	"item_vat": "number",
	"item_gross_worth": "number"
	}
	],
	"summary": {
	"total_net_worth": "number",
	"total_vat": "number",
	"total_gross_worth": "number"
	}
	}
	"""
	```
	### Test on Sample from Dataset

	```python
	from datasets import load_dataset
	import json
	from qwen_vl_utils import process_vision_info

	# Load the dataset
	dataset = load_dataset("katanaml-org/invoices-donut-data-v1")

	# Select a sample (e.g., index 0)
	sample = dataset['train'][0]
	image = sample['image']
	ground_truth = sample['ground_truth']

	print(json.loads(ground_truth))

	# Prepare message
	messages = [
	{"role": "user", "content": [{"type": "image", "image": image}]}
	]

	# Process vision info
	image_inputs, _ = process_vision_info(messages)

	# Apply chat template
	text = processor.tokenizer.apply_chat_template(
	messages,
	template=template,
	tokenize=False,
	add_generation_prompt=True
	)

	# Prepare inputs
	inputs = processor(
	text=[text],
	images=image_inputs,
	padding=True,
	return_tensors="pt"
	).to(model.device)

	# Generation config
	generation_config = {
	"do_sample": False,
	"num_beams": 1,
	"max_new_tokens": 2048
	}

	# Generate
	generated_ids = model.generate(inputs, generation_config)
	generated_ids_trimmed = [
	out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
	]

	output_text = processor.batch_decode(
	generated_ids_trimmed,
	skip_special_tokens=True,
	clean_up_tokenization_spaces=False
	)

	# Parse and print
	try:
	extracted_data = json.loads(output_text[0])
	print("Extracted Data:", extracted_data)
	except json.JSONDecodeError:
	print("Raw Output:", output_text[0])

	# Compare with ground truth
	gt_parsed = json.loads(ground_truth)['gt_parse']
	print("Ground Truth:", gt_parsed)


	```
	### Test on Unseen Data (Custom Image)
	```python
	from PIL import Image
	from io import BytesIO
	import requests

	# Load from local path
	image_path = "/content/image.jpg" # Replace with your path
	image = Image.open(image_path)

	# Or load from URL
	# image_url = "https://example.com/your_invoice.jpg"
	# response = requests.get(image_url)
	# image = Image.open(BytesIO(response.content))

	# Use same inference code as above


	```

	## Example Results

	#### Input Image:

	![Invoice Extraction Example](https://th.bing.com/th/id/OIP.u5Uh7wUsLTy4zqUMOWuT-QHaJl?w=186&h=242&c=7&r=0&o=5&pid=1.7)

	#### Extracted Data:

	```python

	{
	"header": {
	"invoice_no": "49565075",
	"invoice_date": "2019-10-28",
	"seller": "Kane-Morgan 968 Carr Mission Apt. 320 Bernardville, VA 28211",
	"client": "Garcia Inc 445 Haas Viaduct Suite 454 Michaelhaven, LA 32852",
	"seller_tax_id": "964-95-3813",
	"client_tax_id": "909-75-5482",
	"iban": "GB73WCJ55232646970614"
	},
	"items": [
	{
	"item_desc": "Anthropologie Gold Elegant Swan Decorative Metal Bottle Stopper Wine Saver",
	"item_qty": 3.0,
	"item_net_price": 19.98,
	"item_net_worth": 59.94,
	"item_vat": 10.0,
	"item_gross_worth": 65.93
	},
	{
	"item_desc": "Lolita Happy Retirement Wine Glass 15 Ounce GLS11-5534H",
	"item_qty": 1.0,
	"item_net_price": 8.0,
	"item_net_worth": 8.0,
	"item_vat": 10.0,
	"item_gross_worth": 8.8
	},
	{
	"item_desc": "Lolita \"Congratulations\" Hand Painted and Decorated Wine Glass NIB",
	"item_qty": 1.0,
	"item_net_price": 20.0,
	"item_net_worth": 20.0,
	"item_vat": 10.0,
	"item_gross_worth": 22.0
	}
	],
	"summary": {
	"total_net_worth": 87.94,
	"total_vat": 8.79,
	"total_gross_worth": 96.73
	}
	}

	```
	## License
	#### Apache-2.0
	tags:
	###### vision
	###### document-understanding
	###### invoice-processing
	###### donut
	###### qwen


	## Citations

	Cite TRL as:

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```