We release Llama-SARM-4B with SAE weights, with the score head left untrained for reproducibility and score head weights are initialized to all zero for interpretability.

SARM: Interpretable Reward Model via Sparse Autoencoder

Downloads last month
56
Safetensors
Model size
5B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support