YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

mxbai-edge-colbert-v0-17m β€” ONNX export (ColBERT, ModernBERT backbone)

This repository contains an ONNX export of mixedbread-ai/mxbai-edge-colbert-v0-17m produced with PyLate + a ColBERT-aware wrapper. It preserves the projection stack and ColBERT markers ([Q] / [D] ) and includes a skiplist for MaxSim.

Contents

  • onnx/model.onnx β€” FP32 export, opset 17 (βœ… cosine 1.0 vs PyTorch)
  • onnx/model_quantized.onnx β€” Dynamic INT8 (⚠️ cosine ~0.972 vs PyTorch; quality hit)
  • tokenizer.json, tokenizer_config.json, special_tokens_map.json β€” saved from the PyLate-modified tokenizer with markers
  • config.json β€” model config
  • conversion_metadata.json β€” minimal export metadata
  • skiplist.json β€” token IDs to skip during MaxSim (32 punctuation IDs for this model)

Architecture (verified)

Transformer 256 β†’ Projection 512 β†’ Projection 48

Special tokens

  • [Q] : 50368
  • [D] : 50369

Quality checks

  • PyTorch vs ONNX FP32: cosine 1.00000000, MSE 0.0
  • PyTorch vs ONNX INT8: cosine ~0.9719 (degradation observed)

Usage (Python, onnxruntime)

import numpy as np, onnxruntime as ort
from transformers import AutoTokenizer

model_dir = "path/to/this/repo"
sess = ort.InferenceSession(f"{model_dir}/onnx/model.onnx", providers=["CPUExecutionProvider"])
tok = AutoTokenizer.from_pretrained(model_dir)

q = "[Q] what is colbert?"
enc = tok(q, return_tensors="np", padding="max_length", max_length=128, truncation=True)
out = sess.run(None, {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]
print(out.shape)  # (batch, seq_len, 48)

Usage (Node, onnxruntime-node)

import { AutoTokenizer } from "@huggingface/transformers";
import * as ort from "onnxruntime-node";
import fs from "fs";

const modelDir = "path/to/this/repo";
const tokenizer = await AutoTokenizer.from_pretrained(modelDir);
const session = await ort.InferenceSession.create(`${modelDir}/onnx/model.onnx`);

const q = "[Q] what is colbert?";
const encoded = await tokenizer(q, { return_tensors: "np", padding: "max_length", max_length: 128, truncation: true });
const outputs = await session.run({ input_ids: encoded.input_ids, attention_mask: encoded.attention_mask });
console.log(outputs[session.outputNames[0]].dims); // [1, 128, 48]

const skiplist = new Set(JSON.parse(fs.readFileSync(`${modelDir}/skiplist.json`, "utf8")));

Notes

  • INT8 file is provided but shows measurable drift; prefer FP32 (or export FP16 if size matters).
  • Skiplist here is punctuation-only because the model did not expose additional skip words; adjust downstream if you maintain a richer skiplist.

Conversion

  • Tooling: PyLate + custom ColBERT wrapper
  • Opset: 17
  • Date: 2025-02 (see conversion_metadata.json for details)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support