head_dim in config

by stevenshinechen - opened Aug 28

Aug 28

The head_dim in the config is 256, how was this calculated?

I thought the head_dim was hidden_size / num_attention_heads?
hidden_size = 640, num_attention_heads = 4 so head_dim = 160?

From the Gemma docs: https://developers.googleblog.com/en/gemma-explained-overview-gemma-model-family-architectures/
They also use this formula:
Head size (2B: 256, 7B: 256)

It refers to the dimensionality of each attention head within the multi-head attention mechanism. It is calculated by dividing the embedding dimension by the number of heads. For example, if the embedding dimension is 2048 and there are 8 heads, then each head would have a size of 256.

I'm running into issues with dimensionality mismatch between 160 and 256 for KV cache

BalakrishnaCh

Google org Sep 23

Hi,

The problem is the discrepancy between the model's configuration and the values your code is using. The model's config.json file dictates the true dimensions. Your calculation of 160 is based on an incorrect hidden_size value of 640. The head_dim of 256 is an explicit parameter in Gemma's configuration and is not derived from a simple division in some of its smaller variants.

To fix your issue, you need to ensure your code is correctly loading the model's configuration and using the official head_dim of 256 when creating or accessing the KV cache. You must use the values exactly as they are defined in the model's config.json to prevent these mismatches.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment