Bad Autoencoding - Model Checkpoints

Checkpoints for the paper: "Optical Context Compression Is Just (Bad) Autoencoding"

Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick

Links

Available Checkpoints

Naming convention: {regime}_{config}_h{N}_{objective}[_recon-init]

Reconstruction

Checkpoint Regime CR PPL
vision_base_h0_recon Vision base 3.60 1.03
meanpool_w4s4_h0_recon Meanpool w4s4 3.97 1.04

Language Modeling

Checkpoint Regime CR Init PPL
vision_base_h0_lm Vision base 3.60 Direct 5.08
vision_base_h0_lm_recon-init Vision base 3.60 From recon 5.06
meanpool_w4s4_h0_lm_recon-init Meanpool w4s4 3.97 From recon 5.02

Model Details

  • Architecture: DeepSeek-OCR with vision encoder
  • Vision checkpoints: Trained encoder, 768x768 (base)
  • Meanpool checkpoints: Frozen encoder, window=4, stride=4
  • Dataset: 510k samples from FineWiki

Usage

from huggingface_hub import hf_hub_download

# Download a specific checkpoint
checkpoint_path = hf_hub_download(
    repo_id="ivnle/bad-autoencoding",
    filename="vision_base_h0_lm/model.pt",
    repo_type="model"
)

Citation

@article{lee2024optical,
  title={Optical Context Compression Is Just (Bad) Autoencoding},
  author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
  journal={arXiv preprint arXiv:2512.03643},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support