Update model files by removing redundant n-gram embededding weight duplication

#3
LongCat org
No description provided.

So can we know what the model that was uploaded earlier (200GB+) is?

I'd like to know as well.
I was surprised yesterday, when the 8bit quant that mlx-community posted was only 73 GB in size.

LongCat0830 changed pull request title from update to Update model files by removing redundant n-gram embededding weight duplication
LongCat org

So can we know what the model that was uploaded earlier (200GB+) is?

The previous version contained a duplication of n-gram embedding weights, which did not affect runtime performance or model correctness but increased storage size.

Publish this branch
This branch is in draft mode, publish it to be able to merge.

Sign up or log in to comment