🩵 Q4-K-M Quantized SSE: Stable Static Embedding for Retrieval MRL 🩵
A lightweight, faster and powerful embedding model
Performance Snapshot
Our SSE model achieves NDCG@10 = 0.5110 on NanoBEIR — slightly outperforming the popular static-retrieval-mrl-en-v1 (0.5032) while using half the dimensions (512 vs 1024)! 💫 Plus, we're ~2× faster in retrieval thanks to our compact 512D embeddings and Separable Dynamic Tanh.
This model outperforms related models known for their light weight, while using 15.8x smaller data size.
The weight data size is just under 8MB!
| Model | NanoBEIR NDCG@10 | Dimensions | Parameters | Data size | Speed Advantage | License |
|---|---|---|---|---|---|---|
| SSE Retrieval MRL | 0.5124 | 512 | ~16M | 62.5MB | ~2x faster retrieval (ultra-efficient!) | Apache 2.0 |
| Quantized SSE Retrieval MRL | 0.5110 ✨ | 512 | ~16M 🪽 | 7.9MB 🪽 | ~2x faster retrieval (ultra-efficient!) | Apache 2.0 |
static-retrieval-mrl-en-v1 |
0.5032 | 1024 | ~33M | 125MB | baseline | Apache 2.0 |
🩵 Why Choose SSE Retrieval MRL? 🩵
✅ Higher NDCG@10 than all comparable small models (<35M params)
✅ Only ~16M parameters — 27% smaller than MiniLM-L6 (22M) and 52% smaller than BGE-small (33M)
✅ 512D native output — richer than 1024D models, yet half the size of static-retrieval-mrl-en-v1
✅ Matryoshka-ready — smoothly truncate to 256D/128D/64D/32D with graceful degradation
✅ Apache 2.0 licensed — free for commercial & personal use
✅ CPU-optimized — runs faster on edge devices & modest hardware
🩵 Model Details 🩵
| Property | Value |
|---|---|
| Model Type | Sentence Transformer (SSE architecture) |
| Max Sequence Length | ∞ tokens |
| Output Dimension | 512 (with Matryoshka truncation down to 32D!) |
| Similarity Function | Cosine Similarity |
| Language | English |
| License | Apache 2.0 |
SentenceTransformer(
(0): SSE(
(embedding): EmbeddingBag(30522, 512, mode='mean')
(dyt): SeparableDyT()
)
)
🩵 Mathematical formulations 🩵
Dynamic Tanh Normalization (DyT) enables magnitude-adaptive gradient flow for static embeddings. For input dimension x, DyT computes with learnable parameters. The gradient of x is:
For saturated dimensions |x| > 1 yields exponential decay suppressing gradients as For non-saturated dimensions |x| << 1 , preserves near-constant gradients This magnitude-dependent gating attenuates learning signals from noisy, large-magnitude dimensions while maintaining full gradient flow for stable, informative dimensions—providing implicit regularization that enhances generalization without explicit hyperparameters.
🩵 Evaluation Results (NanoBEIR) 🩵
| Dataset | NDCG@10 | MRR@10 | MAP@100 |
|---|---|---|---|
| NanoBEIR Mean | 0.5110 | 0.5645 | 0.4312 |
| NanoClimateFEVER | 0.3127 | 0.3822 | 0.2439 |
| NanoDBPedia | 0.5472 | 0.7440 | 0.4252 |
| NanoFEVER | 0.6870 | 0.6402 | 0.6191 |
| NanoFiQA2018 | 0.3750 | 0.4155 | 0.3129 |
| NanoHotpotQA | 0.6927 | 0.7572 | 0.6205 |
| NanoMSMARCO | 0.4105 | 0.3504 | 0.3694 |
| NanoNFCorpus | 0.3063 | 0.4989 | 0.1148 |
| NanoNQ | 0.4523 | 0.3884 | 0.3941 |
| NanoQuoraRetrieval | 0.9147 | 0.9222 | 0.8944 |
| NanoSCIDOCS | 0.3345 | 0.5562 | 0.2622 |
| NanoArguAna | 0.4154 | 0.3151 | 0.3257 |
| NanoSciFact | 0.5972 | 0.5774 | 0.5703 |
| NanoTouche2020 | 0.5979 | 0.7910 | 0.4526 |
🩵 How to use? 🩵
import torch
from sentence_transformers import SentenceTransformer
# load (remote code enabled)
model = SentenceTransformer(
"RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en",
trust_remote_code=True,
device="cuda" if torch.cuda.is_available() else "cpu",
)
# inference
sentences = [
"Stable Static embedding is interesting.",
"SSE works without attention."
]
with torch.no_grad():
embeddings = model.encode(
sentences,
convert_to_tensor=True,
normalize_embeddings=True,
batch_size=32
)
# cosine similarity
# cosine_sim = embeddings[0] @ embeddings[1].T
cosine_sim = model.similarity(embeddings, embeddings)
print("embeddings shape:", embeddings.shape)
print("cosine similarity matrix:")
print(cosine_sim)
🩵 Retrieval usage 🩵
import torch
from sentence_transformers import SentenceTransformer
# load (remote code enabled)
model = SentenceTransformer(
"RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en",
trust_remote_code=True,
device="cuda" if torch.cuda.is_available() else "cpu",
)
# inference
query = "What is Stable Static Embedding?"
sentences = [
"SSE: Stable Static embedding works without attention.",
"Stable Static Embedding is a fast embedding method designed for retrieval tasks.",
"Static embeddings are often compared with transformer-based sentence encoders.",
"I cooked pasta last night while listening to jazz music.",
"Large language models are commonly trained using next-token prediction objectives.",
"Instruction tuning improves the ability of LLMs to follow human-written prompts.",
]
with torch.no_grad():
embeddings = model.encode(
[query] + sentences,
convert_to_tensor=True,
normalize_embeddings=True,
batch_size=32
)
print("embeddings shape:", embeddings.shape)
# cosine similarity
similarities = model.similarity(embeddings[0], embeddings[1:])
for i, similarity in enumerate(similarities[0].tolist()):
print(f"{similarity:.05f}: {sentences[i]}")
🩵 Training Hyperparameters 🩵
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 512gradient_accumulation_steps: 8learning_rate: 0.1adam_beta2: 0.9999adam_epsilon: 1e-10num_train_epochs: 1lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truedataloader_num_workers: 4batch_sampler: no_duplicates
🩵 Training Datasets 🩵
We learned from 14 datasets:
| Dataset |
|---|
squad |
trivia_qa |
allnli |
pubmedqa |
hotpotqa |
miracl |
mr_tydi |
msmarco |
msmarco_10m |
msmarco_hard |
mldr |
s2orc |
swim_ir |
paq |
nq |
scidocs |
All trained with MatryoshkaLoss — learning representations at multiple scales like Russian nesting dolls!
🩵 Training results 🩵
🩵 About me 🩵
Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.
X(Twitter): https://twitter.com/peony__snow
🩵 Acknowledgements 🩵
The author acknowledge the support of Saldra, Witness and Lumina Logic Minds for providing computational resources used in this work.
I thank the developers of sentence-transformers, python and pytorch.
I thank all the researchers for their efforts to date.
I thank Japan's high standard of education.
And most of all, thank you for your interest in this repository.
🩵 Citation 🩵
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Model tree for RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en
Datasets used to train RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en
Collection including RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en
Papers for RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en
Matryoshka Representation Learning
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Efficient Natural Language Response Suggestion for Smart Reply
Evaluation results
- Cosine Accuracy@1 on NanoClimateFEVERself-reported0.240
- Cosine Accuracy@3 on NanoClimateFEVERself-reported0.480
- Cosine Accuracy@5 on NanoClimateFEVERself-reported0.540
- Cosine Accuracy@10 on NanoClimateFEVERself-reported0.700
- Cosine Precision@1 on NanoClimateFEVERself-reported0.240
- Cosine Precision@3 on NanoClimateFEVERself-reported0.173
- Cosine Precision@5 on NanoClimateFEVERself-reported0.136
- Cosine Precision@10 on NanoClimateFEVERself-reported0.104
- Cosine Recall@1 on NanoClimateFEVERself-reported0.117
- Cosine Recall@3 on NanoClimateFEVERself-reported0.237
- Cosine Recall@5 on NanoClimateFEVERself-reported0.285
- Cosine Recall@10 on NanoClimateFEVERself-reported0.407
- Cosine Ndcg@10 on NanoClimateFEVERself-reported0.313
- Cosine Mrr@10 on NanoClimateFEVERself-reported0.382
- Cosine Map@100 on NanoClimateFEVERself-reported0.244
- Cosine Accuracy@1 on NanoDBPediaself-reported0.660
- Cosine Accuracy@3 on NanoDBPediaself-reported0.820
- Cosine Accuracy@5 on NanoDBPediaself-reported0.820
- Cosine Accuracy@10 on NanoDBPediaself-reported0.920
- Cosine Precision@1 on NanoDBPediaself-reported0.660




