head_dim in config

#8
by stevenshinechen - opened

The head_dim in the config is 256, how was this calculated?

I thought the head_dim was hidden_size / num_attention_heads?
hidden_size = 640, num_attention_heads = 4 so head_dim = 160?

From the Gemma docs: https://developers.googleblog.com/en/gemma-explained-overview-gemma-model-family-architectures/
They also use this formula:
Head size (2B: 256, 7B: 256)

It refers to the dimensionality of each attention head within the multi-head attention mechanism. It is calculated by dividing the embedding dimension by the number of heads. For example, if the embedding dimension is 2048 and there are 8 heads, then each head would have a size of 256.

I'm running into issues with dimensionality mismatch between 160 and 256 for KV cache

Google org

Hi,

The problem is the discrepancy between the model's configuration and the values your code is using. The model's config.json file dictates the true dimensions. Your calculation of 160 is based on an incorrect hidden_size value of 640. The head_dim of 256 is an explicit parameter in Gemma's configuration and is not derived from a simple division in some of its smaller variants.

To fix your issue, you need to ensure your code is correctly loading the model's configuration and using the official head_dim of 256 when creating or accessing the KV cache. You must use the values exactly as they are defined in the model's config.json to prevent these mismatches.

Thanks.

Sign up or log in to comment