Simple Open-Vocabulary Object Detection with Vision Transformers Paper • 2205.06230 • Published May 12, 2022 • 3
timm Backbones Collection Pre-trained feature extraction backbones available in timm. • 18 items • Updated Sep 19 • 10
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21 • 383
RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity Paper • 2504.13099 • Published Apr 17 • 8
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published Aug 18 • 34