Qwen3-8B-KernelBook-SFT Megatron Checkpoint

Description

This is a Megatron-LM distributed checkpoint of Qwen3-8B after Supervised Fine-Tuning (SFT) on the KernelBook dataset. This is iteration 566 of the training process.

Checkpoint Format

This is a raw Megatron-LM checkpoint, NOT a Hugging Face Transformers model. It contains:

  • *.distcp files: Distributed checkpoint shards (8 ranks ร— 2 model parallel = 16 files)
  • common.pt: Common parameters shared across all ranks
  • metadata.json: Checkpoint metadata

Usage

Loading in Megatron-LM

# Load this checkpoint in your Megatron-LM training script
checkpoint_path = "path/to/iter_0000566"

# Use Megatron's checkpoint loading utilities
load_checkpoint(model, optimizer, lr_scheduler, checkpoint_path)

Continuing Training (e.g., for RL)

# Example command to continue training with Megatron-LM
python train.py \
    --load-checkpoint-dir /path/to/iter_0000566 \
    --save-checkpoint-dir /path/to/new_checkpoints \
    # ... other training arguments

Download from Hugging Face Hub

# Clone entire checkpoint
git clone https://huggingface.co/JinnP/Qwen3-8B-KernelBook-SFT-Megatron

# Or use huggingface-hub
from huggingface_hub import snapshot_download
checkpoint_path = snapshot_download(
    repo_id="JinnP/Qwen3-8B-KernelBook-SFT-Megatron",
    repo_type="model"
)

Training Details

  • Base Model: Qwen3-8B
  • Training Method: Supervised Fine-Tuning (SFT)
  • Dataset: KernelBook
  • Iteration: 566
  • Framework: Megatron-LM
  • Parallelism: 8 data parallel ranks ร— 2 model parallel

Important Notes

โš ๏ธ This is NOT a Hugging Face Transformers model. You cannot load it directly with AutoModel.from_pretrained().

To use with Hugging Face Transformers, you would need to:

  1. Convert the checkpoint using Megatron's conversion scripts
  2. Or load it in Megatron-LM and export to HF format

Next Steps

This checkpoint is ready for:

  • Reinforcement Learning (RL) training
  • Further fine-tuning
  • Evaluation in Megatron-LM framework

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support