You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

modernbert-diffusion-universal

Model Summary

A diffusion-style masked language model fine-tuned in code mode using a discrete denoising objective.

Model Details

  • Model ID: philipp-zettl/modernbert-diffusion-code
  • Base model: answerdotai/ModernBERT-base
  • Training mode: universal
  • Task type: Masked token denoising / diffusion-style infilling

Intended Use

Base model trained for diffusion-style mlm. Can be used as base for SFT on specialized data sets.

Example

from refinebert.diffusion_engine import MaskedDiffusionEngine

engine = MaskedDiffusionEngine("philipp-zettl/modernbert-diffusion-universal")
prompt = "def fibonacci(n):"
output = engine.generate(prompt, num_new_tokens=20, steps=12, guidance_scale=3.0)
print(output)

Training Data

Datasets are streamed from Hugging Face and mixed by mode.

Dataset Mix

Dataset Percentage Purpose

Training Procedure

  • Steps: 300000
  • Batch size: 16
  • Sequence length: 256
  • Learning rate: 5e-05
  • CFG dropout probability: 0.1
  • Samples loaded into RAM: 100000

Training Time & Hardware

  • Duration:
  • Hardware:

Metrics (Training)

Metric Value
Training loss (latest) TBD
Training loss (mean) TBD
Training step 300000 / 300000

Limitations & Considerations

  • The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM.
  • Data sources may have licensing or content constraints—review source dataset cards before deployment.
  • Performance can vary substantially by mode (code) and prompt structure.
Downloads last month
35
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for philipp-zettl/modernbert-diffusion-universal

Finetuned
(1063)
this model