---
license: mit
tags:
- vision
- masked-autoencoder
- mixture-of-experts
- earth-observation
- remote-sensing
- self-supervised
---

# geo-moe-mae: Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation

[![GitHub - geo-moe-mae](https://img.shields.io/badge/GitHub-geo--moe--mae-blue?logo=github)](https://github.com/AlbughdadiM/geo-moe-mae)


## Model Description

**geo-moe-mae** is a compact, metadata-aware Mixture-of-Experts Masked Autoencoder (MoE-MAE) designed for Earth Observation (EO) imagery. It aims to bring self-supervised representation learning to remote sensing with a lightweight architecture.

Key features:

- Uses sparse expert routing (i.e. mixture of experts) to improve capacity without scaling parameters linearly. 
- Incorporates geospatial metadata (latitude, longitude) and temporal input encodings (seasonal / daily cycles) alongside image data. 
- Pretrained on the BigEarthNet Landsat dataset. :contentReference[oaicite:3]{index=3}  
- Evaluated via linear probing (frozen encoder) on downstream tasks including BigEarthNet and EuroSAT, achieving competitive performance relative to much larger models.
- Very lightweight: model has only ~2.5 million parameters in its design.

### Model Architecture

- Base is a masked autoencoder (vision transformer style) with a MoE mechanism to route inputs to different expert submodules. 
- Metadata (geographic and temporal) is fused into encoding layers to inform representation learning.  
- During inference for embeddings, the model’s encoder is typically frozen and outputs embeddings used by downstream linear classifiers.  

## Intended Use & Applications

### Primary Use Cases

- **Representation learning** for Earth Observation imagery.  
- **Downstream classification or regression** tasks (e.g. land cover classification, change detection, etc.), via linear probes or fine-tuning.  
- **Transfer learning** across EO datasets, especially where metadata is available (e.g. lat/lon, time).  
- **Lightweight deployment** scenarios (constrained compute), due to small model size.

### Out-of-scope / Misuse

- Not designed for dense segmentation out-of-the-box (unless adapted).  
- May underperform on highly detailed tasks needing large capacity or very fine resolution.  
- Metadata might bias model predictions if geographic or temporal distributions vary in training vs inference domains.  
- Do *not* use the model for safety-critical applications without validating thoroughly (e.g. disaster response, agricultural guarantees, etc.).  

## Training Data & Pretraining

- Pretrained on the **BigEarthNet-Landsat** dataset (a large EO dataset). 
- Seasonal / temporal cycles and geospatial metadata were included as additional inputs (latitude, longitude, etc.).  
- Trained via masked autoencoding objective (reconstruction loss and MoE balancing loss) on image patches.
- 

- Evaluated by freezing the encoder and training **linear probes/classifiers** on downstream tasks.
- Tasks include classification on BigEarthNet and EuroSAT datasets.
- compared with baseline models of much larger capacity, showing competitive performance.  

## Limitations & Biases

- The model’s incorporation of metadata means that it may rely too much on geospatial or temporal priors, which can lead to overfitting to regions with distinct metadata distributions.  
- The linear-probing evaluation does not reflect full fine-tuning scenarios (i.e. adapting full model).  
- The original training data (BigEarthNet) may have class imbalance, geographic coverage gaps, or biases in sensors or times.  

## How to Use

Here’s a sketch of how you might load and use the model (you may need to adjust paths/configs):

```python
import torch
from models.moe_mae import MOEMAE, build_model

# Load model (checkpoint path)
model_size = "S"
img_size = 40
patch_size = 4
in_channels = 7
checkpoint_path = "./weights/moe_mae_bigearthnet_ls/pretrained_S_best.pth"
encoder = build_model(
        size=model_size,
        img_size=img_size,
        patch_size=patch_size,
        in_chans=in_channels,
    )
model = MOEMAE(encoder).to(device)
model = load_model(model,checkpoint_path,device)
encoder = model.encoder
encoder.eval();

# Preprocess an example image + metadata
img = ...  # preprocessed and normalized image tensor
lat = ...  # normalized lat as sine, cosine pair
lon = ...  # normalized lon as sine, cosine pair
week = ... # normalized week as sine, cosine pair
hour = ... # normalized hour as sine, cosine pair

# Forward to get embeddings
out = model(x_in,week,hour,lat,lon)  # or whatever method in repo
embed = out[-1]
# Then downstream: linear classifier etc.
```

## Citation

If you use this model in your work, please cite:

```bibtex
@misc{albughdadi2025lightweightmetadataawaremixtureofexpertsmasked,
  title        = {Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation},
  author       = {Mohanad Albughdadi},
  year         = {2025},
  eprint       = {2509.10919},
  archivePrefix = {arXiv},
  primaryClass = {cs.CV},
  url          = {https://arxiv.org/abs/2509.10919},
}
```