Qwen3-8B-Herald-SFT

A QLoRA fine-tune of Qwen/Qwen3-8B for natural-language-to-Lean-4 autoformalization (NL→FL), trained on the FrenzyMath/Herald_statements dataset as a supervised cold-start before GRPO reinforcement learning.

Given a natural language mathematical statement, the model produces a structured Lean 4 theorem stub (with sorry placeholders) in a #### {"header": ..., "formal_statement": ...} JSON format that is directly consumable by a downstream GRPO trainer with a Lean 4 compiler reward.

Model Details

Model Description

Developed by: xiaolesu / Osmosis AI
Model type: Causal LM (decoder-only transformer), LoRA adapter
Finetuned from: Qwen/Qwen3-8B
Language(s): English (natural language input) → Lean 4 (formal output)
License: Apache 2.0
Training dataset: FrenzyMath/Herald_statements — 50k shuffled examples (subset of 580k)

Model Sources

Repository: xiaolesu/Qwen3-8B-Herald-SFT-ckpt2000
Base model: https://huggingface.co/Qwen/Qwen3-8B

Uses

Direct Use

This model is designed as a Lean 4 autoformalization engine. Given a natural language math problem, it translates it into a formal Lean 4 theorem statement using Mathlib, with sorry as the proof placeholder. It does not write proofs — only the theorem signature.

Example use case: automatically generating a corpus of Lean 4 theorem stubs from informal mathematical textbooks, competition problems, or research papers.

Downstream Use

This is intended as a SFT cold-start checkpoint for a GRPO reinforcement learning phase, where the Lean 4 compiler provides binary reward signals (type-checks or not). The structured #### JSON output format is specifically designed for easy parsing in a GRPO reward function.

Out-of-Scope Use

Writing complete Lean 4 proofs (the model only generates stubs ending in := by sorry)
Informal mathematical reasoning or problem solving in natural language
Lean 3 or other proof assistants (Coq, Isabelle, etc.)
General-purpose coding tasks

How to Get Started with the Model

import json import torch from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer

BASE_MODEL = "Qwen/Qwen3-8B" ADAPTER = "xiaolesu/Qwen3-8B-Herald-SFT-ckpt2000"

SYSTEM_PROMPT = """
You are a Lean4 auto-formalization engine.

Task

Given a statement of a math problem, translate it into formal Lean 4 (v4.11.0) code with Mathlib. Do not write any proof steps. Use "sorry" as a placeholder for proofs. All statements must end with ":= by sorry" or ":= sorry".

Output Format

Output must start with #### followed by a JSON object:

{"header": "", "formal_statement": " := by sorry"}

Now formalize the following:"""

tokenizer = AutoTokenizer.from_pretrained(ADAPTER) base = AutoModelForCausalLM.from_pretrained( BASE_MODEL, torch_dtype=torch.bfloat16, device_map="auto" ) model = PeftModel.from_pretrained(base, ADAPTER) model.eval()

nl = "Prove that for all natural numbers n, n + 0 = n" messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": json.dumps({"nl_statement": nl})}, ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad(): out = model.generate(**inputs, max_new_tokens=512, temperature=0.6, top_p=0.95, do_sample=True)

response = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(response)

#### {"header": "import Mathlib", "formal_statement": "theorem nat_add_zero (n : ℕ) : n + 0 = n := by sorry"}

Downloads last month: 275

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for xiaolesu/Qwen3-8B-Herald-SFT

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(583)

this model

Adapters

1 model

xiaolesu
/

Qwen3-8B-Herald-SFT