REVEAL: Retinal-risk Vision-Language Early Alzheimer’s Learning

Model Description

REVEAL is a multimodal vision-language model designed to align retinal fundus imaging with individualized clinical risk factors for early prediction of Alzheimer’s disease (AD) and dementia. The model learns joint representations from retinal morphology and structured health data transformed into clinical narratives.

REVEAL leverages pretrained medical foundation models and introduces a group-aware contrastive learning (GACL) strategy to capture clinically meaningful multimodal relationships. The model is designed to support early disease risk stratification and multimodal biomarker discovery.


Model Architecture

REVEAL is composed of:

  • Image Encoder: RETFound retinal imaging foundation model
  • Text Encoder: GatorTron clinical language model
  • Projection Layers: Trainable modules mapping image and text embeddings into a shared latent space
  • Contrastive Learning Module: Group-aware contrastive learning for multimodal alignment

The framework operates in two stages:

  1. Multimodal representation learning using contrastive vision-language alignment
  2. Downstream risk prediction using multimodal embeddings

Training Data

Dataset Source

The model was trained using multimodal data derived from the UK Biobank (https://www.ukbiobank.ac.uk/), a large population-scale biomedical dataset containing retinal imaging and clinical health variables.

Cohort Composition

The dataset includes color fundus photographs and clinical risk factor data from 39,242 participants:

  • Training set: 30,462 participants
  • Validation set: 3,384 participants
  • Test set: 5,396 participants

Training and validation sets contained only cognitively normal participants at baseline. Individuals who developed incident AD or dementia were reserved for downstream evaluation.


Imaging Data

  • Imaging modality: Color fundus photography
  • Initial dataset: 136,994 retinal images
  • Quality-controlled dataset: 66,251 images

Retinal morphometric features were extracted using the AutoMorph pipeline, including:

  • Optic nerve head measurements (cup-to-disc ratios)
  • Vascular morphology metrics
  • Vessel tortuosity and fractal measurements

Clinical Risk Factors

Risk factors include:

Demographic

  • Age
  • Sex
  • Socioeconomic status
  • Ethnicity
  • Employment status

General Health

  • BMI
  • HbA1C
  • Blood pressure
  • Cognitive test scores

Behavioral and Psychiatric

  • Depression
  • Sleep deprivation
  • Smoking history
  • Alcohol use
  • Cannabis use

Lifestyle and Social

  • Physical activity
  • Social engagement
  • Leisure activity

Diet

  • Food intake patterns
  • Beverage consumption
  • Nutritional indicators

Synthetic Clinical Text Generation

Structured clinical variables were converted into standardized clinical narratives using a large language model. Each participant’s risk factors were mapped into a predefined clinical template to enable compatibility with vision-language training.


Training Procedure

Multimodal Representation Learning

REVEAL aligns fundus images and clinical narratives using contrastive vision-language learning. Both modalities are encoded and projected into a shared latent embedding space.


Group-Aware Contrastive Learning (GACL)

REVEAL introduces a group-aware pairing strategy that:

  • Identifies subjects with similar retinal morphology
  • Identifies subjects with similar clinical risk profiles
  • Forms positive training pairs across similar individuals

This enables the model to learn clinically meaningful multimodal relationships rather than relying only on subject-level pairings.


Loss Function

REVEAL uses a modified contrastive loss supporting multiple positive pairs per sample. Similarity is computed using cosine similarity between image and text embeddings.


Hyperparameters

  • Projection dimension: 1024
  • Batch size: 128
  • Learning rate: 2.42e-4
  • Weight decay: 0.0232
  • Temperature parameter: 0.07

Hyperparameters were optimized using Optuna (https://optuna.org/).


Intended Use

Primary Use Cases

REVEAL is intended for research applications, including:

  • Early risk stratification for Alzheimer’s disease and dementia
  • Multimodal biomarker discovery
  • Development of non-invasive screening strategies
  • Population-level disease risk modeling
  • Multimodal clinical representation learning

Appropriate Use

The model should be used:

  • For research or exploratory clinical modeling
  • With appropriate ethical and institutional review
  • With external validation before use in new populations

Out-of-Scope Use

The model is not intended for:

  • Direct clinical diagnosis
  • Medical decision-making without clinician oversight
  • Deployment as a medical device
  • Use in unvalidated populations

Evaluation

REVEAL embeddings were evaluated using downstream support vector machine classifiers.

Incident Alzheimer’s Disease Prediction

  • AUROC: 0.658
  • Balanced Accuracy: 0.610

Incident Dementia Prediction

  • AUROC: 0.659
  • Balanced Accuracy: 0.605

Performance reflects average results across multiple random seeds.


Limitations

  • Model training is limited to the UK Biobank cohort
  • Performance is sensitive to similarity threshold selection
  • Incident AD and dementia cases remain relatively limited
  • Synthetic clinical narrative generation may introduce bias
  • Generalizability to other populations requires external validation

Ethical Considerations

  • Retinal images and clinical variables contain sensitive health data
  • Predictions may influence disease risk interpretation
  • Model outputs should not replace clinical judgment
  • Use requires adherence to privacy, regulatory, and ethical guidelines

Citation

If you use this model, please cite:

@article{leem2026reveal, title={REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction}, author={Leem, Seowung and Gu, Lin and You, Chenyu and Gong, Kuang and Fang, Ruogu}, journal={MIDL 2026 (Under Review)}, year={2026} }

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smilelab/REVEAL

Base model

UFNLP/gatortronS
Finetuned
(1)
this model

Evaluation results