OGBert-110M-Base

A 110M parameter ModernBERT-based masked language model trained on glossary and domain-specific text.

Related models:

mjbommar/ogbert-110m-sentence - Sentence embedding version with mean pooling + L2 normalization

Model Details

Property	Value
Architecture	ModernBERT
Parameters	110M
Hidden size	768
Layers	12
Attention heads	12
Vocab size	32,768
Max sequence	1,024 tokens

Training

Task: Masked Language Modeling (MLM)
Dataset: mjbommar/ogbert-v1-mlm - derived from OpenGloss, a synthetic encyclopedic dictionary with 537K senses across 150K lexemes
Masking: Standard 15% token masking
Training steps: 8,000 steps (selected for optimal downstream performance)
Tokens processed: ~4.5B
Batch size: 1,024
Peak learning rate: 3e-4

Performance

Word Similarity (SimLex-999)

SimLex-999 measures Spearman correlation between model cosine similarities and human judgments on 999 word pairs. Higher = better alignment with human perception of word similarity.

Model	Params	SimLex-999 (ρ)
OGBert-110M-Base	110M	0.345
BERT-base	110M	0.070
RoBERTa-base	125M	-0.061

OGBert-110M-Base achieves 5x better word similarity than BERT-base with the same parameter count.

Document Clustering

Evaluated on 80 domain-specific documents across 10 categories using KMeans.

Model	Params	ARI	Cluster Acc
OGBert-110M-Base	110M	0.941	0.975
BERT-base	110M	0.896	0.950
RoBERTa-base	125M	0.941	0.975

OGBert-110M-Base matches or exceeds RoBERTa-base on clustering tasks.

Usage

Fill-Mask Pipeline

from transformers import pipeline

fill_mask = pipeline('fill-mask', model='mjbommar/ogbert-110m-base')
result = fill_mask('The financial <|mask|> was approved.')

Direct Model Usage

from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('mjbommar/ogbert-110m-base')
model = AutoModelForMaskedLM.from_pretrained('mjbommar/ogbert-110m-base')

inputs = tokenizer('The <|mask|> definition is clear.', return_tensors='pt')
outputs = model(**inputs)

For Sentence Embeddings

Use mjbommar/ogbert-110m-sentence instead, which includes mean pooling and L2 normalization for optimal similarity search.

Citation

If you use this model, please cite the OpenGloss dataset:

@article{bommarito2025opengloss,
  title={OpenGloss: A Synthetic Encyclopedic Dictionary and Semantic Knowledge Graph},
  author={Bommarito II, Michael J.},
  journal={arXiv preprint arXiv:2511.18622},
  year={2025}
}

License

Apache 2.0

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32

Dataset used to train mjbommar/ogbert-110m-base

Paper for mjbommar/ogbert-110m-base

OpenGloss: A Synthetic Encyclopedic Dictionary and Semantic Knowledge Graph

Paper • 2511.18622 • Published Nov 23, 2025

Evaluation results

spearman on SimLex-999
self-reported

0.345