AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese
AuroraX is a lightweight yet powerful cross-lingual reranker built upon the mmBERT-base architecture. It is designed to bridge Traditional Chinese, Simplified Chinese and English, enabling high-quality semantic ranking across languages with remarkable efficiency.
Despite having only 110M non-embedding parameters, AuroraX achieves comparable performance to state-of-the-art rerankers that are twice as large. Its design emphasizes both speed and language adaptability, making it ideal for real-world multilingual retrieval and re-ranking applications.
β¨ Key Features:
- π Cross-Lingual Understanding β Trained to handle English, Traditional Chinese, and Simplified Chinese seamlessly.
- β‘ Lightweight & Fast β Only 110M parameters (non-embedding), optimized for latency-sensitive pipelines.
- π― SOTA-Level Accuracy β Comparable or superior to larger rerankers on Chinese and English benchmarks.
Evaluation
Monolingual Benchmarks
| Model | Metric | CMedQAv2-reranking (ZH) | T2Reranking (ZH) | ZH AVG | AskUbuntuDupQuestions (EN) | HUMENews21InstructionReranking (EN) | HUMEWikipediaRerankingMultilingual (EN) | SciDocsRR (EN) | EN AVG | Total AVG |
|---|---|---|---|---|---|---|---|---|---|---|
| AuroraX-Reranker-Base-v1.0 (Ours, 300M with 100M non-embed params) |
mrr@10 | 0.8201 | 0.8554 | 0.8378 | 0.7936 | 1.0000 | 0.9778 | 0.9305 | 0.9255 | 0.8962 |
| mrr@5 | 0.8145 | 0.8514 | 0.8329 | 0.7841 | 1.0000 | 0.9778 | 0.9289 | 0.9227 | 0.8928 | |
| bge-reranker-v2-m3 (600M params) |
mrr@10 | 0.8598 | 0.8004 | 0.8301 | 0.7635 | 0.9839 | 0.8750 | 0.9211 | 0.8859 | 0.8673 |
| mrr@5 | 0.8569 | 0.7954 | 0.8262 | 0.7532 | 0.9839 | 0.8750 | 0.9191 | 0.8828 | 0.8639 | |
| jina-reranker-v2-base-multilingual (300M params) |
mrr@10 | 0.2828 | 0.7577 | 0.5203 | 0.7420 | 1.0000 | 0.8761 | 0.9478 | 0.8915 | 0.7677 |
| mrr@5 | 0.2759 | 0.7512 | 0.5136 | 0.7299 | 1.0000 | 0.8761 | 0.9467 | 0.8882 | 0.7633 |
Cross-Lingual (ZH β EN) Results
| Model | inhouse-en2zh (HitRate@5) | inhouse-zh2en (HitRate@5) |
|---|---|---|
| AuroraX-Reranker-Base-v1.0 (Ours, 300M with 100M non-embed params) | 0.8459 | 0.9427 |
| bge-reranker-v2-m3 (600M params) | 0.8179 | 0.9160 |
| jina-reranker-v2-base-multilingual (300M params) | 0.7815 | 0.8855 |
Usage
Sentence-Transformers
from sentence_transformers import CrossEncoder
model = CrossEncoder("aqweteddy/AuroraX-Reranker-Base-v1.0")
score = model.predict([("What is Deep Learning?", "Deep learning is a subfield of ML...")])
print(score)
Text-Embedding-Inference (API)
- Install and launch the router:
text-embeddings-router --model-id aqweteddy/AuroraX-Reranker-Base-v1.0
- Run via REST API:
curl 127.0.0.1:8080/rerank \
-X POST \
-d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
Citation
@misc{aurorax2025,
title = {AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese},
author = {aqweteddy},
year = {2025},
howpublished = {\url{https://huggingface.co/aqweteddy/AuroraX-Reranker-Base-v1.0}},
note = {Lightweight and powerful eranker for English, Traditional Chinese, and Simplified Chinese}
}
- Downloads last month
- 31
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for aqweteddy/AuroraX-Reranker-Base-v1.0
Base model
jhu-clsp/mmBERT-base