Papers
arxiv:2511.10286

Diversity Over Scale: Whole-Slide Image Variety Enables H&E Foundation Model Training with Fewer Patches

Published on Nov 13
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Athena, a histopathology foundation model trained on a diverse set of tissue patches rather than sheer volume, achieves state-of-the-art performance on various benchmarks.

AI-generated summary

Rapid progress in computational pathology is increasingly driven by vision foundation models pretrained on vast histopathology datasets. While recent efforts have prioritized training on an ever-larger amount of patches, we take an alternative approach focused on data diversity. Our foundation model, Athena, was initialized from a pretrained model and trained on just 115 million tissue patches, several times fewer than recent histopathology foundation models. Rather than relying on patch volume or complex sampling heuristics, we maximize data diversity by randomly selecting only a moderate number of patches per whole-slide image from our diverse internal repository, which spans multiple countries, institutions, and scanner types. Evaluated on a single patch-level benchmark and four slide-level downstream tasks (two molecular and two morphological), Athena approaches the state-of-the-art and even surpasses several models trained on substantially larger datasets. This indicates that diversity across whole-slide images, rather than patch quantity alone, drives learning in histopathology foundation models.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.10286 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.10286 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.