Error with transformers 4.51.3: RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Using pixtral-12b with transformers-4.51.3 gives the following error:RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Downgrading to transformers-4.48.3 fixes it, but I'd like to be able to use the latest transformers version
@david-crynge
can you share your inference code? Seems like you need to cast inputs to correct dtype with inputs = inputs.to(torch.float16) before doing generation
I am facing the same issue when using qlora, with 4 bits nf4 quantization, float16 compute and quant storage, and double quant enabled. It seems like weights are stored as halftensors as they should and the input is float due to compute_dtype=float16. Any comment on how to fix this @RaushanTurganbay ?