SAELens

Variation of average L0 across layers

#3
by sjgerstner - opened

For each layer, I looked at the transcoder with lowest average L0 (these are also the canonical ones shown on neuronpedia). I noticed this L0 value varies widely across layers, with a sharp drop after layer 10. The sequence looks like this:
76, 65, 49, 54, 88, 87, 95, 70, 52, 72, 88, 5, 6, 8, 8, 8, 10, 12, 13, 12, 11, 13, 15, 25, 37, 41.
Did you train the transcoders on different layers with a different accuracy-sparsity tradeoff?

Hey @sjgerstner , thanks for the careful analysis.

That pattern is real. We did not intentionally train different layers with fundamentally different accuracy-sparsity tradeoffs in the sense of targeting specific L0 values per layer. The same objective, JumpReLU mechanism and Pareto-based model selection procedure were used throughout. However, as documented in the Gemma Scope Technical Report the sparsity sweep ranges were adjusted for certain mid-to-late layers to ensure a meaningful reconstruction-sparsity frontier.

Because released transcoder are selected from each layer's Pareto frontier, the lowest L0 canonical model per layer can land at very different points along that curve. Earlier layers tend to require higher effective dimensionality, so even strong sparsity penalties yeild relatively large L0 values. In deeper layers, activation statistics differ and the sweep (including the adjusted ranges where applicable) can produce much sparser solutions without large reconstruction loss.

So the sharp drop after layer ten reflects a combination of layer dependant representation geometry and documented sweep range adjustments, not a deliberate architectural change in the training objective or an explicit per layer sparsity target.

Thank you!

Hi @srikanta-221 , and thank you for your answer!
Unfortunately I still can't find the place in the tech report where those sweep range adjustments are mentioned. Could you please point me to the exact section?
Btw. your link leads to the Dunefsky et al. Transcoder paper, not to the Gemma scope report, which is here πŸ™ƒ
Thank you!

Google org

Hi @sjgerstner ,
Sincere apologies for this, I just referenced the link which is there in model card.

In the technical report you have mentioned, which is the right one, please refer to "4.2. Impact of sequence position" it clearly states "The SAEs are trained with
different sparsity settings after layers 12 and 20 of Gemma 2 2B and 9B respectively". The answer I gave is an interpretation of all these things.

Thank you! and sorry if it caused any confusion.

Sign up or log in to comment