YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
mxbai-edge-colbert-v0-17m β ONNX export (ColBERT, ModernBERT backbone)
This repository contains an ONNX export of mixedbread-ai/mxbai-edge-colbert-v0-17m produced with PyLate + a ColBERT-aware wrapper. It preserves the projection stack and ColBERT markers ([Q] / [D] ) and includes a skiplist for MaxSim.
Contents
onnx/model.onnxβ FP32 export, opset 17 (β cosine 1.0 vs PyTorch)onnx/model_quantized.onnxβ Dynamic INT8 (β οΈ cosine ~0.972 vs PyTorch; quality hit)tokenizer.json,tokenizer_config.json,special_tokens_map.jsonβ saved from the PyLate-modified tokenizer with markersconfig.jsonβ model configconversion_metadata.jsonβ minimal export metadataskiplist.jsonβ token IDs to skip during MaxSim (32 punctuation IDs for this model)
Architecture (verified)
Transformer 256 β Projection 512 β Projection 48
Special tokens
[Q]: 50368[D]: 50369
Quality checks
- PyTorch vs ONNX FP32: cosine 1.00000000, MSE 0.0
- PyTorch vs ONNX INT8: cosine ~0.9719 (degradation observed)
Usage (Python, onnxruntime)
import numpy as np, onnxruntime as ort
from transformers import AutoTokenizer
model_dir = "path/to/this/repo"
sess = ort.InferenceSession(f"{model_dir}/onnx/model.onnx", providers=["CPUExecutionProvider"])
tok = AutoTokenizer.from_pretrained(model_dir)
q = "[Q] what is colbert?"
enc = tok(q, return_tensors="np", padding="max_length", max_length=128, truncation=True)
out = sess.run(None, {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]
print(out.shape) # (batch, seq_len, 48)
Usage (Node, onnxruntime-node)
import { AutoTokenizer } from "@huggingface/transformers";
import * as ort from "onnxruntime-node";
import fs from "fs";
const modelDir = "path/to/this/repo";
const tokenizer = await AutoTokenizer.from_pretrained(modelDir);
const session = await ort.InferenceSession.create(`${modelDir}/onnx/model.onnx`);
const q = "[Q] what is colbert?";
const encoded = await tokenizer(q, { return_tensors: "np", padding: "max_length", max_length: 128, truncation: true });
const outputs = await session.run({ input_ids: encoded.input_ids, attention_mask: encoded.attention_mask });
console.log(outputs[session.outputNames[0]].dims); // [1, 128, 48]
const skiplist = new Set(JSON.parse(fs.readFileSync(`${modelDir}/skiplist.json`, "utf8")));
Notes
- INT8 file is provided but shows measurable drift; prefer FP32 (or export FP16 if size matters).
- Skiplist here is punctuation-only because the model did not expose additional skip words; adjust downstream if you maintain a richer skiplist.
Conversion
- Tooling: PyLate + custom ColBERT wrapper
- Opset: 17
- Date: 2025-02 (see
conversion_metadata.jsonfor details)
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support