Finetune: size mismatch error

#1
by 3lumos - opened

If I want to fine-tune this model for my system following the official MACE fine-tuning tutorial, I encounter an issue where the model size cannot stabilize. I am unsure how to resolve this. I have tried adding the hyperparameters in config '--num_channels = 512' and '--hidden_irreps = 128x0e', but the same issue persists.


    raise RuntimeError(

RuntimeError: Error(s) in loading state_dict for ScaleShiftMACE:

        size mismatch for interactions.0.linear_up.weight: copying a param with shape torch.Size([65536]) from checkpoint, the shape in current model is torch.Size([262144]).

        size mismatch for interactions.0.linear_up.output_mask: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).

        size mismatch for interactions.0.conv_tp.output_mask: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([8192]).

        size mismatch for interactions.0.conv_tp_weights.net.9.weight: copying a param with shape torch.Size([512, 64]) from checkpoint, the shape in current model is torch.Size([2048, 64]).

        size mismatch for interactions.0.conv_tp_weights.net.9.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).

        size mismatch for interactions.0.linear_res.weight: copying a param with shape torch.Size([262144]) from checkpoint, the shape in current model is torch.Size([1048576]).

        size mismatch for interactions.0.linear_1.weight: copying a param with shape torch.Size([458752]) from checkpoint, the shape in current model is torch.Size([1835008]).```
MACE foundation models org

Dear @3lumos , can you try to update to the latest main branch of mace (by cloning the repo). And use the following config to the training:

"foundation_model": "mh-1",
"foundation_head": "omat_pbe"

Sign up or log in to comment