Ling-mini-base-2.0 (Glm4Moe Format)

This is a converted version of inclusionAI/Ling-mini-base-2.0 from BailingMoeV2 format to native HuggingFace Glm4Moe format.

Model Details

Original Model: inclusionAI/Ling-mini-base-2.0
Architecture: Glm4MoeForCausalLM (native HuggingFace)
Parameters: 16.5B total, 1.4B activated per token
Experts: 256 routed + 1 shared, top-8 selection
Context Length: 32K

Why This Conversion?

The original Ling model uses trust_remote_code=True with custom BailingMoeV2 implementation. This conversion enables:

Native HuggingFace support without trust_remote_code
Better compatibility with inference frameworks (vLLM, SGLang, etc.)
Public Megatron-LM support

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "larryvrh/Ling-mini-base-2.0-glmfy",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)
model.eval()

# Use original Ling tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "larryvrh/Ling-mini-base-2.0-glmfy",
)

inputs = tokenizer("Hello,", return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items() if k != 'token_type_ids'}

output = model.generate(**inputs, max_new_tokens=32, do_sample=False)
print(tokenizer.decode(output[0]))
# Hello, I'm trying to create a simple web application using Flask...

Conversion Accuracy

FP32 Verification (Lossless)

Prompt	Max Diff	Mean Diff	Top-5 Match
Hello	1.34e-05	1.69e-06	True
1+1=	2.10e-05	1.51e-06	True
今天	6.68e-06	1.01e-06	True

FP32 max difference ~1e-5 is normal floating point error. Conversion is lossless.

BF16 Inference

Prompt	Cosine Sim	Decode Match
Hello, my name is	0.9995	True
The capital of France is	0.9980	False*
def fibonacci(n):	1.0003	False*
1 + 1 =	1.0017	True
今天天气	1.0010	True

*BF16 precision causes accumulated differences in autoregressive decoding. This is expected behavior, not a conversion error.

Weight Mapping

BailingMoeV2	Glm4Moe
`model.word_embeddings`	`model.embed_tokens`
`attention.query_key_value`	`self_attn.q_proj` + `k_proj` + `v_proj`
`attention.dense`	`self_attn.o_proj`
`attention.query_layernorm`	`self_attn.q_norm`
`attention.key_layernorm`	`self_attn.k_norm`
`mlp.gate.expert_bias`	`mlp.gate.e_score_correction_bias`

License

Same as original model. See inclusionAI/Ling-mini-base-2.0 for details.

Downloads last month: 31

Safetensors

Model size

16B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support