Ling-mini-base-2.0 (Glm4Moe Format)

This is a converted version of inclusionAI/Ling-mini-base-2.0 from BailingMoeV2 format to native HuggingFace Glm4Moe format.

Model Details

  • Original Model: inclusionAI/Ling-mini-base-2.0
  • Architecture: Glm4MoeForCausalLM (native HuggingFace)
  • Parameters: 16.5B total, 1.4B activated per token
  • Experts: 256 routed + 1 shared, top-8 selection
  • Context Length: 32K

Why This Conversion?

The original Ling model uses trust_remote_code=True with custom BailingMoeV2 implementation. This conversion enables:

  • Native HuggingFace support without trust_remote_code
  • Better compatibility with inference frameworks (vLLM, SGLang, etc.)
  • Public Megatron-LM support

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "larryvrh/Ling-mini-base-2.0-glmfy",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)
model.eval()

# Use original Ling tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "larryvrh/Ling-mini-base-2.0-glmfy",
)

inputs = tokenizer("Hello,", return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items() if k != 'token_type_ids'}

output = model.generate(**inputs, max_new_tokens=32, do_sample=False)
print(tokenizer.decode(output[0]))
# Hello, I'm trying to create a simple web application using Flask...

Conversion Accuracy

FP32 Verification (Lossless)

Prompt Max Diff Mean Diff Top-5 Match
Hello 1.34e-05 1.69e-06 True
1+1= 2.10e-05 1.51e-06 True
今天 6.68e-06 1.01e-06 True

FP32 max difference ~1e-5 is normal floating point error. Conversion is lossless.

BF16 Inference

Prompt Cosine Sim Decode Match
Hello, my name is 0.9995 True
The capital of France is 0.9980 False*
def fibonacci(n): 1.0003 False*
1 + 1 = 1.0017 True
今天天气 1.0010 True

*BF16 precision causes accumulated differences in autoregressive decoding. This is expected behavior, not a conversion error.

Weight Mapping

BailingMoeV2 Glm4Moe
model.word_embeddings model.embed_tokens
attention.query_key_value self_attn.q_proj + k_proj + v_proj
attention.dense self_attn.o_proj
attention.query_layernorm self_attn.q_norm
attention.key_layernorm self_attn.k_norm
mlp.gate.expert_bias mlp.gate.e_score_correction_bias

License

Same as original model. See inclusionAI/Ling-mini-base-2.0 for details.

Downloads last month
31
Safetensors
Model size
16B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support