--- license: apple-amlr language: - en metrics: - wer pipeline_tag: automatic-speech-recognition tags: - asr - mixture-of-experts - speech - streaming --- # Model Card for Model ID Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the [paper](https://arxiv.org/abs/2507.05724) for details. ## Model Details ### Model Description This ASR model is a 4-expert MoE model (total 613M with 200M activate parameters). The model is streaming which transcribes speech conditioned only on past and current speech. - **Developed by:** Apple Machine Learning Research - **Model type:** ASR - **Language(s):** English - **License:** apple-amlr ## Uses This model is a speech recognition model. ## How to Get Started with the Model Please refer to the [github](https://github.com/apple/ml-omni-router-moe-asr) page for detailed usage. ## Training Details ### Training Data The training data is a large-scale conversational audio dataset collected from publicly accessible sources, named SpeechCrawl. Please refer to the [paper](https://arxiv.org/abs/2507.05724) for details. ## Citation If you find this work useful, please cite our paper: ``` @article{gu2025omnirouter, title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition}, author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep}, journal={arXiv preprint arXiv:2507.05724}, year={2025} } ``` ## Model Card Contact Contact zijin@apple.com for any issues.