YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
ss_bridges_d4096_f0.002
Weight-sparse transformer with bridges, trained with the procedure from Gao et al. (2025).
Model Details (Sparse Model)
- Layers: 2
- Model Dimension: 4096
- Context Length: 512
- Head Dimension: 16
- Vocabulary Size: 4096
Bridges
- Dense Model: jacobcd52/ss_d128_f1
- Encoder Activation Fraction: 0.25
Sparsity
- Weight Sparsity: True
- Target L0 Fraction: 0.002
- Activation Sparsity: True
Training
- Dataset: data/simplestories-tokenized
- Tokenizer: SimpleStories/SimpleStories-1.25M
- Total Tokens: 2,000,000,000
Training Run
Usage
import torch
from huggingface_hub import hf_hub_download
# Download model and bridges
sparse_model_path = hf_hub_download(repo_id="jacobcd52/ss_bridges_d4096_f0.002", filename="sparse_model.bin")
bridges_path = hf_hub_download(repo_id="jacobcd52/ss_bridges_d4096_f0.002", filename="bridges.bin")
config_path = hf_hub_download(repo_id="jacobcd52/ss_bridges_d4096_f0.002", filename="config.json")
# Load (requires the SparseGPT and BridgeSet classes from this repo)
sparse_state_dict = torch.load(sparse_model_path)
bridges_state_dict = torch.load(bridges_path)
- Downloads last month
- 38
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support