Any plan to release 120b and 20-30b level models?

by Sunny2038 - opened Aug 31

Sunny2038

Aug 31

It would be great if LongChat could have 120b and 20-30b level models, like gpt-120b, GLM-4.5-Air, or/and gpt-20b, seed-36b, qwen3-30b.

aquiffoo

Aug 31

yeah... like 100B A3.3-5.5B and 32B A1-2B. this would be great

Oldify

Sep 1

also any dense versions plz

kalashshah19

Sep 1

also any dense versions plz

Why dense ?

MrDevolver

Sep 3

I'm all for Mixture of Experts in size of approximately 30B with about 3-6B active parameters. That seems to be a good trade-off between speed and performance and especially useful for bringing better quality models to those regular PC users who cannot afford more powerful hardware. Recent advancements in this model size suggest this size of AI models has great potential that's yet to be unleashed.

kalashshah19

Sep 4

I'm all for Mixture of Experts in size of approximately 30B with about 3-6B active parameters. That seems to be a good trade-off between speed and performance and especially useful for bringing better quality models to those regular PC users who cannot afford more powerful hardware. Recent advancements in this model size suggest this size of AI models has great potential that's yet to be unleashed.

Ohh yeah i got it, thanks.

Downtown-Case

Sep 5

TBH I'd rather have them try something more experimental, like a bitnet (or at least partially bitnet) model for cheaper deployment, or an 'alternate' attention scheme for longer context.

We have tons of mid-sized options now. I suppose a mid-sized MLA model with a long context would be interesting though, as everything else seems to use regular GQA.

ct-2

Sep 5

TBH I'd rather have them try something more experimental, like a bitnet (or at least partially bitnet) model for cheaper deployment, or an 'alternate' attention scheme for longer context.

We have tons of mid-sized options now. I suppose a mid-sized MLA model with a long context would be interesting though, as everything else seems to use regular GQA.

Yeah, I agree lets hope something comes out with a good trim like 2bit and native fp4 like from Apple foundation model and openai gpt-oss

Downtown-Case

Sep 5

Yeah, I agree lets hope something comes out with a good trim like 2bit and native fp4 like from Apple foundation model and openai gpt-oss

We already got a native 2-bit QAT 300B, with Baidu 300B! Yeah, more native QAT would be awesome.

No one's converted the 300B weights out of PaddlePaddle though, as far as I know.

ct-2

Sep 5

•

edited Sep 5

PaddlePaddle

To be honest, no one has a clue about this either. Maybe the Chinese speaking folks have their own llama.cpp alternatives and have been running them on their CPUs? Who knows? 😎

ccocks-deca

Sep 11

It's weird that such a large model is called "Flash"... wonder what the "Pro" would be

Downtown-Case

Sep 11

•

edited Sep 11

PaddlePaddle

To be honest, no one has a clue about this either. Maybe the Chinese speaking folks have their own llama.cpp alternatives and have been running them on their CPUs? Who knows? 😎

There is, but AFAIK it doesn't support hybrid inference. And the 2-bit QAT includes their own dequantization scheme meant for GPU-only serving.

I echo this sentiment though! Longcat devs: if you are reading this, we would love QAT PyTorch weights, even if its off-the-shelf 4-bit QAT. That would be a great way to draw a lot of attention to a large MoE release, and it's cheap to do if you're already finetuning the weights. There are QAT code examples in TorchTune and Axolotl.

It's weird that such a large model is called "Flash"... wonder what the "Pro" would be.

Makes sense to me; the active parameter count is pretty modest.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment