NoamDiamant52
/

qwen2_0.5b_sae

SAELens

Model card Files Files and versions

xet

Community

NoamDiamant52 commited on Mar 9

Commit

09267d3

verified ·

1 Parent(s): a626240

Update README.md

Browse files

Files changed (1) hide show

README.md +71 -0

README.md CHANGED Viewed

@@ -11,9 +11,80 @@ This repository contains the following SAEs:
 - qwen2_0.5b_sae_post_residual_layer_12
 - qwen2_0.5b_sae_post_residual_layer_16
 Load these SAEs using SAELens as below:
 ```python
 from sae_lens import SAE
 sae, cfg_dict, sparsity = SAE.from_pretrained("NoamDiamant52/qwen2_0.5b_sae", "<sae_id>")
 ```

 - qwen2_0.5b_sae_post_residual_layer_12
 - qwen2_0.5b_sae_post_residual_layer_16
 Load these SAEs using SAELens as below:
 ```python
 from sae_lens import SAE
 sae, cfg_dict, sparsity = SAE.from_pretrained("NoamDiamant52/qwen2_0.5b_sae", "<sae_id>")
+```
+THose SAEs were trained with the following configuration:
+```python
+for name in  ["blocks.0.hook_resid_post", "blocks.4.hook_resid_post", "blocks.8.hook_resid_post", "blocks.12.hook_resid_post", "blocks.16.hook_resid_post"]:  #["blocks.0.hook_resid_post",
+    total_training_steps = 30_000  # probably we should do more
+    batch_size = 4096
+    total_training_tokens = total_training_steps * batch_size
+    num = int(name.split(".")[1])
+    lr_warm_up_steps = 0
+    lr_decay_steps = total_training_steps // 5  # 20% of training
+    l1_warm_up_steps = total_training_steps // 20  # 5% of training
+    cfg = LanguageModelSAERunnerConfig(
+        # Data Generating Function (Model + Training Distibuion)
+        model_name="Qwen/Qwen2.5-0.5B",  # our model (more options here: https://neelnanda-io.github.io/TransformerLens/generated/model_properties_table.html)
+        hook_name=name,  # A valid hook point (see more details here: https://neelnanda-io.github.io/TransformerLens/generated/demos/Main_Demo.html#Hook-Points)
+        hook_layer=num,  # Only one layer in the model.
+        d_in=896,  # the width of the mlp output.
+        dataset_path="NoamDiamant52/TinyStories_tokenized",  # this is a tokenized language dataset on Huggingface for the Tiny Stories corpus.
+        is_dataset_tokenized=True,
+        streaming=True,  # we could pre-download the token dataset if it was small.
+        # SAE Parameters
+        mse_loss_normalization=None,  # We won't normalize the mse loss,
+        expansion_factor=16,  # the width of the SAE. Larger will result in better stats but slower training.
+        b_dec_init_method="zeros",  # The geometric median can be used to initialize the decoder weights.
+        apply_b_dec_to_input=False,  # We won't apply the decoder weights to the input.
+        normalize_sae_decoder=False,
+        scale_sparsity_penalty_by_decoder_norm=True,
+        decoder_heuristic_init=True,
+        init_encoder_as_decoder_transpose=True,
+        normalize_activations="expected_average_only_in",
+        # Training Parameters
+        lr=5e-5,  # lower the better, we'll go fairly high to speed up the tutorial.
+        adam_beta1=0.9,  # adam params (default, but once upon a time we experimented with these.)
+        adam_beta2=0.999,
+        lr_scheduler_name="constant",  # constant learning rate with warmup. Could be better schedules out there.
+        lr_warm_up_steps=lr_warm_up_steps,  # this can help avoid too many dead features initially.
+        lr_decay_steps=lr_decay_steps,  # this will help us avoid overfitting.
+        l1_coefficient=5,  # will control how sparse the feature activations are
+        l1_warm_up_steps=l1_warm_up_steps,  # this can help avoid too many dead features initially.
+        lp_norm=1.0,  # the L1 penalty (and not a Lp for p < 1)
+        train_batch_size_tokens=batch_size,
+        context_size=512,  # will control the lenght of the prompts we feed to the model. Larger is better but slower. so for the tutorial we'll use a short one.
+        # Activation Store Parameters
+        n_batches_in_buffer=64,  # controls how many activations we store / shuffle.
+        training_tokens=total_training_tokens,  # 100 million tokens is quite a few, but we want to see good stats. Get a coffee, come back.
+        store_batch_size_prompts=16,
+        # Resampling protocol
+        use_ghost_grads=False,  # we don't use ghost grads anymore.
+        feature_sampling_window=1000,  # this controls our reporting of feature sparsity stats
+        dead_feature_window=1000,  # would effect resampling or ghost grads if we were using it.
+        dead_feature_threshold=1e-4,  # would effect resampling or ghost grads if we were using it.
+        # WANDB
+        log_to_wandb=True,  # always use wandb unless you are just testing code.
+        wandb_project="sae_lens_tutorial",
+        wandb_log_frequency=30,
+        eval_every_n_wandb_logs=20,
+        # Misc
+        device=device,
+        seed=42,
+        n_checkpoints=0,
+        checkpoint_path="checkpoints",
+        dtype="float32",
+    )
+    # look at the next cell to see some instruction for what to do while this is running.
+    sparse_autoencoder = SAETrainingRunner(cfg).run()
+    sparse_autoencoder.save_model(f"post_residual_layer_{num}")
 ```