racineai
/

Flantier-SmolVLM-500M-dse

Safetensors

idefics3

Model card Files Files and versions

xet

Community

paulml commited on Mar 26

Commit

d7a064b

verified ·

1 Parent(s): e8c99d6

Create README.md

Browse files

Files changed (1) hide show

README.md +121 -0

README.md ADDED Viewed

	@@ -0,0 +1,121 @@

+---
+license: apache-2.0
+datasets:
+- racineai/OGC_2_vdr-visRAG-colpali
+language:
+- fr
+- en
+- de
+- es
+- it
+base_model:
+- HuggingFaceTB/SmolVLM-500M-Instruct
+---
+# Flantier-SmolVLM-500M-dse
+A lightweight multimodal vision-language model specialized for technical document retrieval.
+## Overview
+Flantier-SmolVLM-500M-dse (Document Screenshot Embedding) is a 500M parameter vision-language model designed for efficient retrieval of technical documentation. It directly encodes document screenshots into embeddings, preserving all information including text, images, and layout without requiring separate content extraction.
+## Key Features
+- **Efficient Retrieval**: Generates document and query embeddings for semantic similarity search
+- **Multimodal Understanding**: Processes text, diagrams, charts, and tables in their original layout
+- **Lightweight Architecture**: Only 500M parameters, runs on consumer GPUs
+- **No Preprocessing Required**: Directly works with document screenshots
+## Installation
+```bash
+pip install transformers accelerate pillow
+```
+## Usage Example
+```python
+from PIL import Image
+import torch
+from transformers import AutoProcessor, AutoModelForVision2Seq
+# Load model and processor
+processor = AutoProcessor.from_pretrained("racineai/Flantier-SmolVLM-500M-dse")
+model = AutoModelForVision2Seq.from_pretrained(
+    "racineai/Flantier-SmolVLM-500M-dse",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+# Load document image
+document_image = Image.open("technical_document.jpg")
+# Process for document embedding
+doc_messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image"},
+            {"type": "text", "text": "What is shown in this image?"}
+        ]
+    },
+]
+doc_prompt = processor.apply_chat_template(doc_messages, add_generation_prompt=True)
+doc_inputs = processor(text=doc_prompt, images=[document_image], return_tensors="pt").to(model.device)
+# Generate document embedding
+with torch.no_grad():
+    doc_outputs = model(**doc_inputs, output_hidden_states=True, return_dict=True)
+    doc_embedding = doc_outputs.hidden_states[-1][:, -1]  # Last token embedding
+    doc_embedding = torch.nn.functional.normalize(doc_embedding, p=2, dim=-1)
+# Process query embedding
+query = "What are the specifications of this component?"
+query_messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "text", "text": query}
+        ]
+    },
+]
+query_prompt = processor.apply_chat_template(query_messages, add_generation_prompt=True)
+query_inputs = processor(text=query_prompt, return_tensors="pt").to(model.device)
+# Generate query embedding
+with torch.no_grad():
+    query_outputs = model(**query_inputs, output_hidden_states=True, return_dict=True)
+    query_embedding = query_outputs.hidden_states[-1][:, -1]  # Last token embedding
+    query_embedding = torch.nn.functional.normalize(query_embedding, p=2, dim=-1)
+# Calculate similarity
+similarity = torch.nn.functional.cosine_similarity(query_embedding, doc_embedding)
+print(f"Similarity score: {similarity.item():.4f}")
+```
+## Applications
+- **Technical Document Retrieval**: Find relevant documents based on technical queries
+- **Technical Support Systems**: Match user questions to relevant documentation
+- **Engineering Knowledge Management**: Index and search technical specifications, diagrams, and reports
+## Training Methodology
+This model was trained using the Document Screenshot Embedding (DSE) approach, which treats document screenshots as a unified input format. This eliminates the need for content extraction preprocessing while preserving all visual and textual information in documents.
+## Citation
+```
+@misc{flantier-smolvlm-dse,
+  author = {racine.ai},
+  title = {Flantier-SmolVLM-500M-dse: A Lightweight Document Screenshot Embedding Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/racineai/Flantier-SmolVLM-500M-dse}
+}
+```
+## License
+This model is released under the Apache 2.0 license.