Ling-mini-base-2.0 (Glm4Moe Format)
This is a converted version of inclusionAI/Ling-mini-base-2.0 from BailingMoeV2 format to native HuggingFace Glm4Moe format.
Model Details
- Original Model: inclusionAI/Ling-mini-base-2.0
- Architecture: Glm4MoeForCausalLM (native HuggingFace)
- Parameters: 16.5B total, 1.4B activated per token
- Experts: 256 routed + 1 shared, top-8 selection
- Context Length: 32K
Why This Conversion?
The original Ling model uses trust_remote_code=True with custom BailingMoeV2 implementation. This conversion enables:
- Native HuggingFace support without
trust_remote_code - Better compatibility with inference frameworks (vLLM, SGLang, etc.)
- Public Megatron-LM support
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"larryvrh/Ling-mini-base-2.0-glmfy",
torch_dtype=torch.bfloat16,
device_map="cuda",
)
model.eval()
# Use original Ling tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"larryvrh/Ling-mini-base-2.0-glmfy",
)
inputs = tokenizer("Hello,", return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items() if k != 'token_type_ids'}
output = model.generate(**inputs, max_new_tokens=32, do_sample=False)
print(tokenizer.decode(output[0]))
# Hello, I'm trying to create a simple web application using Flask...
Conversion Accuracy
FP32 Verification (Lossless)
| Prompt | Max Diff | Mean Diff | Top-5 Match |
|---|---|---|---|
| Hello | 1.34e-05 | 1.69e-06 | True |
| 1+1= | 2.10e-05 | 1.51e-06 | True |
| 今天 | 6.68e-06 | 1.01e-06 | True |
FP32 max difference ~1e-5 is normal floating point error. Conversion is lossless.
BF16 Inference
| Prompt | Cosine Sim | Decode Match |
|---|---|---|
| Hello, my name is | 0.9995 | True |
| The capital of France is | 0.9980 | False* |
| def fibonacci(n): | 1.0003 | False* |
| 1 + 1 = | 1.0017 | True |
| 今天天气 | 1.0010 | True |
*BF16 precision causes accumulated differences in autoregressive decoding. This is expected behavior, not a conversion error.
Weight Mapping
| BailingMoeV2 | Glm4Moe |
|---|---|
model.word_embeddings |
model.embed_tokens |
attention.query_key_value |
self_attn.q_proj + k_proj + v_proj |
attention.dense |
self_attn.o_proj |
attention.query_layernorm |
self_attn.q_norm |
attention.key_layernorm |
self_attn.k_norm |
mlp.gate.expert_bias |
mlp.gate.e_score_correction_bias |
License
Same as original model. See inclusionAI/Ling-mini-base-2.0 for details.
- Downloads last month
- 31
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support