Any plan to release 120b and 20-30b level models?
It would be great if LongChat could have 120b and 20-30b level models, like gpt-120b, GLM-4.5-Air, or/and gpt-20b, seed-36b, qwen3-30b.
yeah... like 100B A3.3-5.5B and 32B A1-2B. this would be great
also any dense versions plz
also any dense versions plz
Why dense ?
I'm all for Mixture of Experts in size of approximately 30B with about 3-6B active parameters. That seems to be a good trade-off between speed and performance and especially useful for bringing better quality models to those regular PC users who cannot afford more powerful hardware. Recent advancements in this model size suggest this size of AI models has great potential that's yet to be unleashed.
I'm all for Mixture of Experts in size of approximately 30B with about 3-6B active parameters. That seems to be a good trade-off between speed and performance and especially useful for bringing better quality models to those regular PC users who cannot afford more powerful hardware. Recent advancements in this model size suggest this size of AI models has great potential that's yet to be unleashed.
Ohh yeah i got it, thanks.
TBH I'd rather have them try something more experimental, like a bitnet (or at least partially bitnet) model for cheaper deployment, or an 'alternate' attention scheme for longer context.
We have tons of mid-sized options now. I suppose a mid-sized MLA model with a long context would be interesting though, as everything else seems to use regular GQA.
TBH I'd rather have them try something more experimental, like a bitnet (or at least partially bitnet) model for cheaper deployment, or an 'alternate' attention scheme for longer context.
We have tons of mid-sized options now. I suppose a mid-sized MLA model with a long context would be interesting though, as everything else seems to use regular GQA.
Yeah, I agree lets hope something comes out with a good trim like 2bit and native fp4 like from Apple foundation model and openai gpt-oss
Yeah, I agree lets hope something comes out with a good trim like 2bit and native fp4 like from Apple foundation model and openai gpt-oss
We already got a native 2-bit QAT 300B, with Baidu 300B! Yeah, more native QAT would be awesome.
No one's converted the 300B weights out of PaddlePaddle though, as far as I know.
PaddlePaddle
To be honest, no one has a clue about this either. Maybe the Chinese speaking folks have their own llama.cpp alternatives and have been running them on their CPUs? Who knows? π
It's weird that such a large model is called "Flash"... wonder what the "Pro" would be
PaddlePaddle
To be honest, no one has a clue about this either. Maybe the Chinese speaking folks have their own llama.cpp alternatives and have been running them on their CPUs? Who knows? π
There is, but AFAIK it doesn't support hybrid inference. And the 2-bit QAT includes their own dequantization scheme meant for GPU-only serving.
I echo this sentiment though! Longcat devs: if you are reading this, we would love QAT PyTorch weights, even if its off-the-shelf 4-bit QAT. That would be a great way to draw a lot of attention to a large MoE release, and it's cheap to do if you're already finetuning the weights. There are QAT code examples in TorchTune and Axolotl.
It's weird that such a large model is called "Flash"... wonder what the "Pro" would be.
Makes sense to me; the active parameter count is pretty modest.