No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models Paper • 2510.03978 • Published Oct 4 • 2
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • Aug 11 • 75
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports Paper • 2505.11733 • Published May 16 • 7
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published Mar 17 • 22
SmolVLM Collection State-of-the-art compact VLMs for on-device applications: Base, Synthetic, and Instruct. Check our blog: https://huggingface.co/blog/smolvlm • 5 items • Updated May 5 • 40
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13 • 55
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception Paper • 2410.12628 • Published Oct 16, 2024 • 41
HyenaDNA Models Collection HyenaDNA models usable directly with Hugging Face classes like AutoModel. • 8 items • Updated Nov 14, 2023 • 19
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records Paper • 2406.16341 • Published Jun 24, 2024 • 14