🩵 Q4-K-M Quantized SSE: Stable Static Embedding for Retrieval MRL 🩵

A lightweight, faster and powerful embedding model

Performance Snapshot
Our SSE model achieves NDCG@10 = 0.5110 on NanoBEIR — slightly outperforming the popular static-retrieval-mrl-en-v1 (0.5032) while using half the dimensions (512 vs 1024)! 💫 Plus, we're ~2× faster in retrieval thanks to our compact 512D embeddings and Separable Dynamic Tanh.

This model outperforms related models known for their light weight, while using 15.8x smaller data size.

The weight data size is just under 8MB!

Model	NanoBEIR NDCG@10	Dimensions	Parameters	Data size	Speed Advantage	License
SSE Retrieval MRL	0.5124	512	~16M	62.5MB	~2x faster retrieval (ultra-efficient!)	Apache 2.0
Quantized SSE Retrieval MRL	0.5110 ✨	512	~16M 🪽	7.9MB 🪽	~2x faster retrieval (ultra-efficient!)	Apache 2.0
`static-retrieval-mrl-en-v1`	0.5032	1024	~33M	125MB	baseline	Apache 2.0

Pre-quantization model

🩵 Why Choose SSE Retrieval MRL? 🩵

✅ Higher NDCG@10 than all comparable small models (<35M params)
✅ Only ~16M parameters — 27% smaller than MiniLM-L6 (22M) and 52% smaller than BGE-small (33M)
✅ 512D native output — richer than 1024D models, yet half the size of static-retrieval-mrl-en-v1 ✅ Matryoshka-ready — smoothly truncate to 256D/128D/64D/32D with graceful degradation
✅ Apache 2.0 licensed — free for commercial & personal use
✅ CPU-optimized — runs faster on edge devices & modest hardware

🩵 Model Details 🩵

Property	Value
Model Type	Sentence Transformer (SSE architecture)
Max Sequence Length	∞ tokens
Output Dimension	512 (with Matryoshka truncation down to 32D!)
Similarity Function	Cosine Similarity
Language	English
License	Apache 2.0

SentenceTransformer(
  (0): SSE(
    (embedding): EmbeddingBag(30522, 512, mode='mean')
    (dyt): SeparableDyT()
  )
)

🩵 Mathematical formulations 🩵

Dynamic Tanh Normalization (DyT) enables magnitude-adaptive gradient flow for static embeddings. For input dimension x, DyT computes $y_k = c_k \tanh(a_k x_k + b_k)$ with learnable parameters. The gradient of x is:

$\frac{\partial y_k}{\partial x_k} = c_k a_k \, \mathrm{sech}^2(a_k x_k + b_k).$

For saturated dimensions |x| > 1 $|a_i x_i + b_i| \gg 1$ yields exponential decay $\mathrm{sech}^2(z) \sim 4e^{-2|z|}$ suppressing gradients as $\partial y_i / \partial x_i \to 0$ For non-saturated dimensions |x| << 1 , $\mathrm{sech}^2(z) \approx 1$ preserves near-constant gradients $\partial y_j / \partial x_j \approx c_j a_j$ This magnitude-dependent gating attenuates learning signals from noisy, large-magnitude dimensions while maintaining full gradient flow for stable, informative dimensions—providing implicit regularization that enhances generalization without explicit hyperparameters.

🩵 Evaluation Results (NanoBEIR) 🩵

Dataset	NDCG@10	MRR@10	MAP@100
NanoBEIR Mean	0.5110	0.5645	0.4312
NanoClimateFEVER	0.3127	0.3822	0.2439
NanoDBPedia	0.5472	0.7440	0.4252
NanoFEVER	0.6870	0.6402	0.6191
NanoFiQA2018	0.3750	0.4155	0.3129
NanoHotpotQA	0.6927	0.7572	0.6205
NanoMSMARCO	0.4105	0.3504	0.3694
NanoNFCorpus	0.3063	0.4989	0.1148
NanoNQ	0.4523	0.3884	0.3941
NanoQuoraRetrieval	0.9147	0.9222	0.8944
NanoSCIDOCS	0.3345	0.5562	0.2622
NanoArguAna	0.4154	0.3151	0.3257
NanoSciFact	0.5972	0.5774	0.5703
NanoTouche2020	0.5979	0.7910	0.4526

🩵 How to use? 🩵

import torch
from sentence_transformers import SentenceTransformer

# load (remote code enabled)
model = SentenceTransformer(
    "RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en",
    trust_remote_code=True,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

# inference
sentences = [
    "Stable Static embedding is interesting.",
    "SSE works without attention."
]

with torch.no_grad():
    embeddings = model.encode(
        sentences,
        convert_to_tensor=True,
        normalize_embeddings=True,
        batch_size=32
    )

# cosine similarity
# cosine_sim = embeddings[0] @ embeddings[1].T
cosine_sim = model.similarity(embeddings, embeddings)

print("embeddings shape:", embeddings.shape)
print("cosine similarity matrix:")
print(cosine_sim)

🩵 Retrieval usage 🩵

import torch
from sentence_transformers import SentenceTransformer

# load (remote code enabled)
model = SentenceTransformer(
    "RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en",
    trust_remote_code=True,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

# inference
query = "What is Stable Static Embedding?"
sentences = [
    "SSE: Stable Static embedding works without attention.",
    "Stable Static Embedding is a fast embedding method designed for retrieval tasks.",
    "Static embeddings are often compared with transformer-based sentence encoders.",
    "I cooked pasta last night while listening to jazz music.",
    "Large language models are commonly trained using next-token prediction objectives.",
    "Instruction tuning improves the ability of LLMs to follow human-written prompts.",
]


with torch.no_grad():
    embeddings = model.encode(
        [query] + sentences,
        convert_to_tensor=True,
        normalize_embeddings=True,
        batch_size=32
    )

print("embeddings shape:", embeddings.shape)

# cosine similarity
similarities = model.similarity(embeddings[0], embeddings[1:])
for i, similarity in enumerate(similarities[0].tolist()):
    print(f"{similarity:.05f}: {sentences[i]}")

🩵 Training Hyperparameters 🩵

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 512
gradient_accumulation_steps: 8
learning_rate: 0.1
adam_beta2: 0.9999
adam_epsilon: 1e-10
num_train_epochs: 1
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: True
dataloader_num_workers: 4
batch_sampler: no_duplicates

🩵 Training Datasets 🩵

We learned from 14 datasets:

Dataset
`squad`
`trivia_qa`
`allnli`
`pubmedqa`
`hotpotqa`
`miracl`
`mr_tydi`
`msmarco`
`msmarco_10m`
`msmarco_hard`
`mldr`
`s2orc`
`swim_ir`
`paq`
`nq`
`scidocs`

All trained with MatryoshkaLoss — learning representations at multiple scales like Russian nesting dolls!

🩵 Training results 🩵

🩵 About me 🩵

Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.

X(Twitter): https://twitter.com/peony__snow

🩵 Acknowledgements 🩵

The author acknowledge the support of Saldra, Witness and Lumina Logic Minds for providing computational resources used in this work.

I thank the developers of sentence-transformers, python and pytorch.

I thank all the researchers for their efforts to date.

I thank Japan's high standard of education.

And most of all, thank you for your interest in this repository.

🩵 Citation 🩵

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en

Base model

RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en

Finetuned

(1)

this model

Datasets used to train RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en

Collection including RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en

SSE: Stable Static Embedding

Collection

A lightweight, faster and high performance embedding model by adopting Separable Dynamic Tanh normalization. • 3 items • Updated 4 days ago

Papers for RikkaBotan/quantized-stable-static-embedding-fast-retrieval-mrl-en

Evaluation results

Cosine Accuracy@1 on NanoClimateFEVER
self-reported

0.240
Cosine Accuracy@3 on NanoClimateFEVER
self-reported

0.480
Cosine Accuracy@5 on NanoClimateFEVER
self-reported

0.540
Cosine Accuracy@10 on NanoClimateFEVER
self-reported

0.700
Cosine Precision@1 on NanoClimateFEVER
self-reported

0.240
Cosine Precision@3 on NanoClimateFEVER
self-reported

0.173
Cosine Precision@5 on NanoClimateFEVER
self-reported

0.136
Cosine Precision@10 on NanoClimateFEVER
self-reported

0.104
Cosine Recall@1 on NanoClimateFEVER
self-reported

0.117
Cosine Recall@3 on NanoClimateFEVER
self-reported

0.237
Cosine Recall@5 on NanoClimateFEVER
self-reported

0.285
Cosine Recall@10 on NanoClimateFEVER
self-reported

0.407
Cosine Ndcg@10 on NanoClimateFEVER
self-reported

0.313
Cosine Mrr@10 on NanoClimateFEVER
self-reported

0.382
Cosine Map@100 on NanoClimateFEVER
self-reported

0.244
Cosine Accuracy@1 on NanoDBPedia
self-reported

0.660
Cosine Accuracy@3 on NanoDBPedia
self-reported

0.820
Cosine Accuracy@5 on NanoDBPedia
self-reported

0.820
Cosine Accuracy@10 on NanoDBPedia
self-reported

0.920
Cosine Precision@1 on NanoDBPedia
self-reported

0.660