AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese

AuroraX is a lightweight yet powerful cross-lingual reranker built upon the mmBERT-base architecture. It is designed to bridge Traditional Chinese, Simplified Chinese and English, enabling high-quality semantic ranking across languages with remarkable efficiency.

Despite having only 110M non-embedding parameters, AuroraX achieves comparable performance to state-of-the-art rerankers that are twice as large. Its design emphasizes both speed and language adaptability, making it ideal for real-world multilingual retrieval and re-ranking applications.

✨ Key Features:

🌏 Cross-Lingual Understanding — Trained to handle English, Traditional Chinese, and Simplified Chinese seamlessly.
⚡ Lightweight & Fast — Only 110M parameters (non-embedding), optimized for latency-sensitive pipelines.
🎯 SOTA-Level Accuracy — Comparable or superior to larger rerankers on Chinese and English benchmarks.

Evaluation

Monolingual Benchmarks

Model	Metric	CMedQAv2-reranking (ZH)	T2Reranking (ZH)	ZH AVG	AskUbuntuDupQuestions (EN)	HUMENews21InstructionReranking (EN)	HUMEWikipediaRerankingMultilingual (EN)	SciDocsRR (EN)	EN AVG	Total AVG
AuroraX-Reranker-Base-v1.0 (Ours, 300M with 100M non-embed params)	mrr@10	0.8201	0.8554	0.8378	0.7936	1.0000	0.9778	0.9305	0.9255	0.8962
	mrr@5	0.8145	0.8514	0.8329	0.7841	1.0000	0.9778	0.9289	0.9227	0.8928
bge-reranker-v2-m3 (600M params)	mrr@10	0.8598	0.8004	0.8301	0.7635	0.9839	0.8750	0.9211	0.8859	0.8673
	mrr@5	0.8569	0.7954	0.8262	0.7532	0.9839	0.8750	0.9191	0.8828	0.8639
jina-reranker-v2-base-multilingual (300M params)	mrr@10	0.2828	0.7577	0.5203	0.7420	1.0000	0.8761	0.9478	0.8915	0.7677
	mrr@5	0.2759	0.7512	0.5136	0.7299	1.0000	0.8761	0.9467	0.8882	0.7633

Cross-Lingual (ZH ↔ EN) Results

Model	inhouse-en2zh (HitRate@5)	inhouse-zh2en (HitRate@5)
AuroraX-Reranker-Base-v1.0 (Ours, 300M with 100M non-embed params)	0.8459	0.9427
bge-reranker-v2-m3 (600M params)	0.8179	0.9160
jina-reranker-v2-base-multilingual (300M params)	0.7815	0.8855

Usage

Sentence-Transformers

from sentence_transformers import CrossEncoder

model = CrossEncoder("aqweteddy/AuroraX-Reranker-Base-v1.0")
score = model.predict([("What is Deep Learning?", "Deep learning is a subfield of ML...")])
print(score)

Text-Embedding-Inference (API)

Install and launch the router:

text-embeddings-router --model-id aqweteddy/AuroraX-Reranker-Base-v1.0

Run via REST API:

curl 127.0.0.1:8080/rerank \
  -X POST \
  -d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
  -H 'Content-Type: application/json'

Citation

@misc{aurorax2025,
  title         = {AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese},
  author        = {aqweteddy},
  year          = {2025},
  howpublished  = {\url{https://huggingface.co/aqweteddy/AuroraX-Reranker-Base-v1.0}},
  note          = {Lightweight and powerful eranker for English, Traditional Chinese, and Simplified Chinese}
}

Downloads last month: 31

Safetensors

Model size

0.3B params

Tensor type

BF16

Inference Providers NEW

Text Ranking

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aqweteddy/AuroraX-Reranker-Base-v1.0

Base model

jhu-clsp/mmBERT-base

Finetuned

(24)

this model