Qwen3-8B-Herald-SFT

A QLoRA fine-tune of Qwen/Qwen3-8B for natural-language-to-Lean-4 autoformalization (NL→FL), trained on the FrenzyMath/Herald_statements dataset as a supervised cold-start before GRPO reinforcement learning.

Given a natural language mathematical statement, the model produces a structured Lean 4 theorem stub (with sorry placeholders) in a #### {"header": ..., "formal_statement": ...} JSON format that is directly consumable by a downstream GRPO trainer with a Lean 4 compiler reward.

Model Details

Model Description

  • Developed by: xiaolesu / Osmosis AI
  • Model type: Causal LM (decoder-only transformer), LoRA adapter
  • Finetuned from: Qwen/Qwen3-8B
  • Language(s): English (natural language input) → Lean 4 (formal output)
  • License: Apache 2.0
  • Training dataset: FrenzyMath/Herald_statements — 50k shuffled examples (subset of 580k)

Model Sources

Uses

Direct Use

This model is designed as a Lean 4 autoformalization engine. Given a natural language math problem, it translates it into a formal Lean 4 theorem statement using Mathlib, with sorry as the proof placeholder. It does not write proofs — only the theorem signature.

Example use case: automatically generating a corpus of Lean 4 theorem stubs from informal mathematical textbooks, competition problems, or research papers.

Downstream Use

This is intended as a SFT cold-start checkpoint for a GRPO reinforcement learning phase, where the Lean 4 compiler provides binary reward signals (type-checks or not). The structured #### JSON output format is specifically designed for easy parsing in a GRPO reward function.

Out-of-Scope Use

  • Writing complete Lean 4 proofs (the model only generates stubs ending in := by sorry)
  • Informal mathematical reasoning or problem solving in natural language
  • Lean 3 or other proof assistants (Coq, Isabelle, etc.)
  • General-purpose coding tasks

How to Get Started with the Model

import json import torch from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer

BASE_MODEL = "Qwen/Qwen3-8B" ADAPTER = "xiaolesu/Qwen3-8B-Herald-SFT-ckpt2000"

SYSTEM_PROMPT = """
You are a Lean4 auto-formalization engine.

Task

Given a statement of a math problem, translate it into formal Lean 4 (v4.11.0) code with Mathlib. Do not write any proof steps. Use "sorry" as a placeholder for proofs. All statements must end with ":= by sorry" or ":= sorry".

Output Format

Output must start with #### followed by a JSON object:

{"header": "", "formal_statement": " := by sorry"}

Now formalize the following:"""

tokenizer = AutoTokenizer.from_pretrained(ADAPTER) base = AutoModelForCausalLM.from_pretrained( BASE_MODEL, torch_dtype=torch.bfloat16, device_map="auto" ) model = PeftModel.from_pretrained(base, ADAPTER) model.eval()

nl = "Prove that for all natural numbers n, n + 0 = n" messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": json.dumps({"nl_statement": nl})}, ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad(): out = model.generate(**inputs, max_new_tokens=512, temperature=0.6, top_p=0.95, do_sample=True)

response = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(response)

#### {"header": "import Mathlib", "formal_statement": "theorem nat_add_zero (n : â„•) : n + 0 = n := by sorry"}

Downloads last month
275
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xiaolesu/Qwen3-8B-Herald-SFT

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Adapter
(583)
this model
Adapters
1 model

Dataset used to train xiaolesu/Qwen3-8B-Herald-SFT