AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese

AuroraX is a lightweight yet powerful cross-lingual reranker built upon the mmBERT-base architecture. It is designed to bridge Traditional Chinese, Simplified Chinese and English, enabling high-quality semantic ranking across languages with remarkable efficiency.

Despite having only 110M non-embedding parameters, AuroraX achieves comparable performance to state-of-the-art rerankers that are twice as large. Its design emphasizes both speed and language adaptability, making it ideal for real-world multilingual retrieval and re-ranking applications.

✨ Key Features:

  • 🌏 Cross-Lingual Understanding β€” Trained to handle English, Traditional Chinese, and Simplified Chinese seamlessly.
  • ⚑ Lightweight & Fast β€” Only 110M parameters (non-embedding), optimized for latency-sensitive pipelines.
  • 🎯 SOTA-Level Accuracy β€” Comparable or superior to larger rerankers on Chinese and English benchmarks.

Evaluation

Monolingual Benchmarks

Model Metric CMedQAv2-reranking (ZH) T2Reranking (ZH) ZH AVG AskUbuntuDupQuestions (EN) HUMENews21InstructionReranking (EN) HUMEWikipediaRerankingMultilingual (EN) SciDocsRR (EN) EN AVG Total AVG
AuroraX-Reranker-Base-v1.0
(Ours, 300M with 100M non-embed params)
mrr@10 0.8201 0.8554 0.8378 0.7936 1.0000 0.9778 0.9305 0.9255 0.8962
mrr@5 0.8145 0.8514 0.8329 0.7841 1.0000 0.9778 0.9289 0.9227 0.8928
bge-reranker-v2-m3
(600M params)
mrr@10 0.8598 0.8004 0.8301 0.7635 0.9839 0.8750 0.9211 0.8859 0.8673
mrr@5 0.8569 0.7954 0.8262 0.7532 0.9839 0.8750 0.9191 0.8828 0.8639
jina-reranker-v2-base-multilingual
(300M params)
mrr@10 0.2828 0.7577 0.5203 0.7420 1.0000 0.8761 0.9478 0.8915 0.7677
mrr@5 0.2759 0.7512 0.5136 0.7299 1.0000 0.8761 0.9467 0.8882 0.7633

Cross-Lingual (ZH ↔ EN) Results

Model inhouse-en2zh (HitRate@5) inhouse-zh2en (HitRate@5)
AuroraX-Reranker-Base-v1.0 (Ours, 300M with 100M non-embed params) 0.8459 0.9427
bge-reranker-v2-m3 (600M params) 0.8179 0.9160
jina-reranker-v2-base-multilingual (300M params) 0.7815 0.8855

Usage

Sentence-Transformers

from sentence_transformers import CrossEncoder

model = CrossEncoder("aqweteddy/AuroraX-Reranker-Base-v1.0")
score = model.predict([("What is Deep Learning?", "Deep learning is a subfield of ML...")])
print(score)

Text-Embedding-Inference (API)

  1. Install and launch the router:
text-embeddings-router --model-id aqweteddy/AuroraX-Reranker-Base-v1.0
  1. Run via REST API:
curl 127.0.0.1:8080/rerank \
  -X POST \
  -d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
  -H 'Content-Type: application/json'

Citation

@misc{aurorax2025,
  title         = {AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese},
  author        = {aqweteddy},
  year          = {2025},
  howpublished  = {\url{https://huggingface.co/aqweteddy/AuroraX-Reranker-Base-v1.0}},
  note          = {Lightweight and powerful eranker for English, Traditional Chinese, and Simplified Chinese}
}
Downloads last month
31
Safetensors
Model size
0.3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aqweteddy/AuroraX-Reranker-Base-v1.0

Finetuned
(24)
this model