Corrupted weights?

by Muzel - opened 8 days ago

8 days ago

I have been trying to write an inference engine for CoDA in Swift/MLX and it only generated gibberish. I then checked the weights, e.g.:

Layer 23 kNorm loaded: shape=[128], std=1.2598647
RAW kNorm stats: mean=2.0254, min=-0.0121, max=9.0000

I then did the same thing via Google Colab and PyTorch, e.g.

--------------------
Layer: model.layers.23.self_attn.q_proj.weight
  Stats: Mean=-0.0001, Std=0.0584, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
  Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
  Stats: Mean=0.0001, Std=0.0614, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
  Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
  Stats: Mean=1.4233, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
  Stats: Mean=2.0254, Std=1.2648, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
  Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
  Stats: Mean=0.0000, Std=0.0683, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
  Stats: Mean=-0.0000, Std=0.0622, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
  Stats: Mean=10.5016, Std=5.4567, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
  Stats: Mean=2.0072, Std=0.3130, Min=-0.0005, Max=5.1875
--------------------

I'll gladly provide all the values if needed.

But the question is: Are the weights corrupted?

hlnchen

Salesforce org 8 days ago

Hi Muzel, thanks for checking in. Could you provide the environment version you worked on, especially transformer version?

Muzel

8 days ago

transformers: 4.57.1
torch: 2.8.0+cu126
Python 3.12

hlnchen

Salesforce org 6 days ago

Could you try an elder version, say 4.47.1?

Muzel

6 days ago

•

edited 6 days ago

With 4.47.1:

Layer: model.layers.23.self_attn.q_proj.weight
  Stats: Mean=-0.0001, Std=0.0584, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
  Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
  Stats: Mean=0.0001, Std=0.0614, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
  Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
  Stats: Mean=1.4233, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
  Stats: Mean=2.0254, Std=1.2648, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
  Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
  Stats: Mean=0.0000, Std=0.0683, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
  Stats: Mean=-0.0000, Std=0.0622, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
  Stats: Mean=10.5016, Std=5.4567, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
  Stats: Mean=2.0072, Std=0.3130, Min=-0.0005, Max=5.1875

hlnchen

Salesforce org 6 days ago

Did you also experience similar behavior? We did our post-training and eval under 4.47.1 and bfloat16 precision.

model_name = "Salesforce/CoDA-v0-Instruct"
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

Muzel

6 days ago

•

edited 6 days ago

I don't understand what you mean with 'did you experience similar behavior', sorry. I hadn't tried running my inference engine with transformers, as I'm using MLX.

I reran the weights logging with exactly your configuration:

Layer: model.layers.23.self_attn.q_proj.weight
  Stats: Mean=-0.0001, Std=0.0583, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
  Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
  Stats: Mean=0.0001, Std=0.0615, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
  Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
  Stats: Mean=1.4219, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
  Stats: Mean=2.0312, Std=1.2656, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
  Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
  Stats: Mean=0.0000, Std=0.0684, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
  Stats: Mean=-0.0000, Std=0.0623, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
  Stats: Mean=10.5000, Std=5.4688, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
  Stats: Mean=2.0000, Std=0.3125, Min=-0.0005, Max=5.1875

hlnchen

Salesforce org 6 days ago

Hi Muzel, sorry for the unclear context - could you replicate the undesired behavior of the model/suspicious weights when loading the model in transformers 4.47.1 and do inference? I am not an expert of MLX and not sure what happens in your environment.

Muzel

5 days ago

No, I cannot, I do not have the capacity to rewrite the whole framework just to test that - but thanks for helping anyway!

Muzel changed discussion status to closed 4 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment