view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family 5 days ago • 58
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 16 items • Updated Dec 13, 2025 • 12
view article Article TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval Dec 4, 2025 • 19
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23, 2025 • 70
Simple Projection Variants Improve ColBERT Performance Paper • 2510.12327 • Published Oct 14, 2025 • 7
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report Paper • 2510.14880 • Published Oct 16, 2025 • 19
view article Article Welcome EmbeddingGemma, Google's new efficient embedding model +4 Sep 4, 2025 • 269
Seq vs Seq: An Open Suite of Paired Encoders and Decoders Paper • 2507.11412 • Published Jul 15, 2025 • 31
BioClinical ModernBERT Collection This project was a collaboration between members of the Dana-Farber Cancer Institute, LightOn, MIT, OpenEvidence and Microsoft. • 3 items • Updated Sep 9, 2025 • 11
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model Mar 10, 2025 • 146
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published Feb 25, 2025 • 29
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 24 days ago • 553
ModernGLiNER Collection GLiNER models based on modern encoder architectures • 2 items • Updated Dec 24, 2024 • 7
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 159
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 156
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling Paper • 2409.14683 • Published Sep 23, 2024 • 11