What are the differences between base models and non-base models?

#2
by makisekurisu-jp - opened

I couldn’t find any differences between the two in the model card.

"The intended use cases are fine-tuning, in-context learning experiments, and other research or development purposes, not direct interaction. However, the control tokens, e.g., <|im_start|> and <|im_end|> were trained to allow efficient LoRA-style PEFT with the official chat template, mitigating the need to finetune embeddings, a significant optimization given Qwen3.5's larger vocabulary."

The regular model is the end-user solution trained to behave as an assistant, while the base model exists as the ground for finetuning or continuous pretraining. However, keep in mind that Qwen3 and Qwen3.5 base models aren't raw snapshots but were midtrained on assistant-aligned data & have a strong assistant bias, making them unsuitable for anything but instruction-following cases.

Sign up or log in to comment