What are the differences between base models and non-base models?

by makisekurisu-jp - opened 4 days ago

Discussion

makisekurisu-jp

4 days ago

•

edited 4 days ago

I couldn’t find any differences between the two in the model card.

johnhampel

3 days ago

"The intended use cases are fine-tuning, in-context learning experiments, and other research or development purposes, not direct interaction. However, the control tokens, e.g., <|im_start|> and <|im_end|> were trained to allow efficient LoRA-style PEFT with the official chat template, mitigating the need to finetune embeddings, a significant optimization given Qwen3.5's larger vocabulary."

FriskyFennec

about 9 hours ago

The regular model is the end-user solution trained to behave as an assistant, while the base model exists as the ground for finetuning or continuous pretraining. However, keep in mind that Qwen3 and Qwen3.5 base models aren't raw snapshots but were midtrained on assistant-aligned data & have a strong assistant bias, making them unsuitable for anything but instruction-following cases.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment