--- license: mit tags: - vision - masked-autoencoder - mixture-of-experts - earth-observation - remote-sensing - self-supervised --- # geo-moe-mae: Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation [![GitHub - geo-moe-mae](https://img.shields.io/badge/GitHub-geo--moe--mae-blue?logo=github)](https://github.com/AlbughdadiM/geo-moe-mae) ## Model Description **geo-moe-mae** is a compact, metadata-aware Mixture-of-Experts Masked Autoencoder (MoE-MAE) designed for Earth Observation (EO) imagery. It aims to bring self-supervised representation learning to remote sensing with a lightweight architecture. Key features: - Uses sparse expert routing (i.e. mixture of experts) to improve capacity without scaling parameters linearly. - Incorporates geospatial metadata (latitude, longitude) and temporal input encodings (seasonal / daily cycles) alongside image data. - Pretrained on the BigEarthNet Landsat dataset. :contentReference[oaicite:3]{index=3} - Evaluated via linear probing (frozen encoder) on downstream tasks including BigEarthNet and EuroSAT, achieving competitive performance relative to much larger models. - Very lightweight: model has only ~2.5 million parameters in its design. ### Model Architecture - Base is a masked autoencoder (vision transformer style) with a MoE mechanism to route inputs to different expert submodules. - Metadata (geographic and temporal) is fused into encoding layers to inform representation learning. - During inference for embeddings, the model’s encoder is typically frozen and outputs embeddings used by downstream linear classifiers. ## Intended Use & Applications ### Primary Use Cases - **Representation learning** for Earth Observation imagery. - **Downstream classification or regression** tasks (e.g. land cover classification, change detection, etc.), via linear probes or fine-tuning. - **Transfer learning** across EO datasets, especially where metadata is available (e.g. lat/lon, time). - **Lightweight deployment** scenarios (constrained compute), due to small model size. ### Out-of-scope / Misuse - Not designed for dense segmentation out-of-the-box (unless adapted). - May underperform on highly detailed tasks needing large capacity or very fine resolution. - Metadata might bias model predictions if geographic or temporal distributions vary in training vs inference domains. - Do *not* use the model for safety-critical applications without validating thoroughly (e.g. disaster response, agricultural guarantees, etc.). ## Training Data & Pretraining - Pretrained on the **BigEarthNet-Landsat** dataset (a large EO dataset). - Seasonal / temporal cycles and geospatial metadata were included as additional inputs (latitude, longitude, etc.). - Trained via masked autoencoding objective (reconstruction loss and MoE balancing loss) on image patches. - - Evaluated by freezing the encoder and training **linear probes/classifiers** on downstream tasks. - Tasks include classification on BigEarthNet and EuroSAT datasets. - compared with baseline models of much larger capacity, showing competitive performance. ## Limitations & Biases - The model’s incorporation of metadata means that it may rely too much on geospatial or temporal priors, which can lead to overfitting to regions with distinct metadata distributions. - The linear-probing evaluation does not reflect full fine-tuning scenarios (i.e. adapting full model). - The original training data (BigEarthNet) may have class imbalance, geographic coverage gaps, or biases in sensors or times. ## How to Use Here’s a sketch of how you might load and use the model (you may need to adjust paths/configs): ```python import torch from models.moe_mae import MOEMAE, build_model # Load model (checkpoint path) model_size = "S" img_size = 40 patch_size = 4 in_channels = 7 checkpoint_path = "./weights/moe_mae_bigearthnet_ls/pretrained_S_best.pth" encoder = build_model( size=model_size, img_size=img_size, patch_size=patch_size, in_chans=in_channels, ) model = MOEMAE(encoder).to(device) model = load_model(model,checkpoint_path,device) encoder = model.encoder encoder.eval(); # Preprocess an example image + metadata img = ... # preprocessed and normalized image tensor lat = ... # normalized lat as sine, cosine pair lon = ... # normalized lon as sine, cosine pair week = ... # normalized week as sine, cosine pair hour = ... # normalized hour as sine, cosine pair # Forward to get embeddings out = model(x_in,week,hour,lat,lon) # or whatever method in repo embed = out[-1] # Then downstream: linear classifier etc. ``` ## Citation If you use this model in your work, please cite: ```bibtex @misc{albughdadi2025lightweightmetadataawaremixtureofexpertsmasked, title = {Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation}, author = {Mohanad Albughdadi}, year = {2025}, eprint = {2509.10919}, archivePrefix = {arXiv}, primaryClass = {cs.CV}, url = {https://arxiv.org/abs/2509.10919}, } ```