lelegu commited on
Commit
0acbdf4
·
verified ·
1 Parent(s): 8da717e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -62
README.md CHANGED
@@ -1,63 +1,63 @@
1
- ---
2
- license: apple-amlr
3
- language:
4
- - en
5
- metrics:
6
- - wer
7
- pipeline_tag: automatic-speech-recognition
8
- tags:
9
- - asr
10
- - mixture-of-experts
11
- - speech
12
- - streaming
13
- ---
14
- # Model Card for Model ID
15
-
16
- Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers.
17
-
18
-
19
- ## Model Details
20
-
21
- ### Model Description
22
-
23
- This ASR model is a 4-expert MoE model (total 613M with 200M activate parameters). The model is streaming which transcribes speech conditioned only on the previous speech.
24
-
25
- - **Developed by:** Apple Machine Learning Research
26
- - **Model type:** ASR
27
- - **Language(s):** English
28
- - **License:** apple-amlr
29
-
30
- ## Uses
31
-
32
- This model is a speech recognition model.
33
-
34
- ## How to Get Started with the Model
35
-
36
- Please refer to the github pages for detailed usage.
37
-
38
- ## Training Details
39
-
40
- ### Training Data
41
-
42
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
43
-
44
- It is trained on the 1M English subset of Apple's internal conversational dataset named SpeechCrawl.
45
-
46
-
47
- ## Citation
48
-
49
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
50
-
51
- If you find this work useful, please cite our paper:
52
- ```
53
- @article{gu2025omnirouter,
54
- title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
55
- author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
56
- journal={arXiv preprint arXiv:2507.05724},
57
- year={2025}
58
- }
59
- ```
60
-
61
- ## Model Card Contact
62
-
63
  Contact zijin@apple.com for any issues.
 
1
+ ---
2
+ license: apple-amlr
3
+ language:
4
+ - en
5
+ metrics:
6
+ - wer
7
+ pipeline_tag: automatic-speech-recognition
8
+ tags:
9
+ - asr
10
+ - mixture-of-experts
11
+ - speech
12
+ - streaming
13
+ ---
14
+ # Model Card for Model ID
15
+
16
+ Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers.
17
+ Please refer to the [paper](https://arxiv.org/abs/2507.05724) for details.
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ This ASR model is a 4-expert MoE model (total 613M with 200M activate parameters). The model is streaming which transcribes speech conditioned only on past and current speech.
24
+
25
+ - **Developed by:** Apple Machine Learning Research
26
+ - **Model type:** ASR
27
+ - **Language(s):** English
28
+ - **License:** apple-amlr
29
+
30
+ ## Uses
31
+
32
+ This model is a speech recognition model.
33
+
34
+ ## How to Get Started with the Model
35
+
36
+ Please refer to the [github](https://github.com/apple/ml-omni-router-moe-asr) page for detailed usage.
37
+
38
+ ## Training Details
39
+
40
+ ### Training Data
41
+
42
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
43
+
44
+ The training data is a large-scale conversational audio dataset collected from publicly accessible sources, named SpeechCrawl. Please refer to the [paper](https://arxiv.org/abs/2507.05724) for details.
45
+
46
+
47
+ ## Citation
48
+
49
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
50
+
51
+ If you find this work useful, please cite our paper:
52
+ ```
53
+ @article{gu2025omnirouter,
54
+ title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
55
+ author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
56
+ journal={arXiv preprint arXiv:2507.05724},
57
+ year={2025}
58
+ }
59
+ ```
60
+
61
+ ## Model Card Contact
62
+
63
  Contact zijin@apple.com for any issues.