nielsr HF Staff commited on
Commit
61b7667
·
verified ·
1 Parent(s): 8cf70f4

Update license, paper title, and code repository link

Browse files

This PR improves the model card for `UsefulSensors/moonshine-tiny-uk` by:

- Updating the `license` metadata from `other` to `apache-2.0`, reflecting the permissive open-source license often associated with the Moonshine project.
- Adding the full paper title "Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices" for better discoverability.
- Updating the code repository link from a specific README file to the root of the `https://github.com/usefulsensors/moonshine` GitHub repository, providing a more direct and canonical link for the model's code, and changing the link label to `[[Code]]`.

These changes enhance the accuracy, clarity, and completeness of the model card.

Files changed (1) hide show
  1. README.md +13 -10
README.md CHANGED
@@ -1,14 +1,17 @@
1
  ---
2
- license: other
3
  language:
4
- - uk
5
  library_name: transformers
 
6
  pipeline_tag: automatic-speech-recognition
7
- arxiv: https://arxiv.org/abs/2509.02523
8
  ---
 
9
  # Moonshine
10
 
11
- [[Paper]](https://arxiv.org/abs/2509.02523) [[Installation]](https://github.com/usefulsensors/moonshine/blob/main/README.md)
 
 
12
 
13
  This is the model card for running the automatic speech recognition (ASR) models (Moonshine models) trained and released by Moonshine AI (f.k.a Useful Sensors.)
14
 
@@ -56,13 +59,13 @@ print(processor.decode(generated_ids[0], skip_special_tokens=True))
56
 
57
  ## Model Details
58
 
59
- This Moonshine model is trained for the speech recognition task, capable of transcribing Ukranian speech audio into Ukrainian text. Moonshine AI developed the models to support their business direction of developing real time speech transcription products based on low cost hardware. The following table shows comparisons of common ASR evaluations sets. For more information about evaluation, please refer to the paper.
60
 
61
  | Size | Parameters | Fleurs (WER) ↓ | Common Voice 17 (WER) ↓ |
62
  |:----:|:----------:|:------------------:|:------------------:|
63
- | whisper tiny | 39 M | 63.83 | 67.07 |
64
- | whisper medium | 769 M | 11.62 | 20.9 |
65
- | moonshine tiny | 27 M | 18.25 | 26.11 |
66
 
67
  ### Release date
68
 
@@ -92,7 +95,7 @@ Our evaluations show that, the models exhibit greater accuracy on standard datas
92
 
93
  However, like any machine learning model, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself.
94
 
95
- In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. It is likely that this behavior and hallucinations may be worse for short audio segments, or segments where parts of words are cut off at the beginning or the end of the segment.
96
 
97
  ## Broader Implications
98
 
@@ -113,4 +116,4 @@ If you benefit from our work, please cite us:
113
  primaryClass={cs.CL},
114
  url={https://arxiv.org/abs/2509.02523},
115
  }
116
- ```
 
1
  ---
 
2
  language:
3
+ - uk
4
  library_name: transformers
5
+ license: apache-2.0
6
  pipeline_tag: automatic-speech-recognition
7
+ arxiv: https://arxiv.org/abs/2509.02523
8
  ---
9
+
10
  # Moonshine
11
 
12
+ ## Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices
13
+
14
+ [[Paper]](https://arxiv.org/abs/2509.02523) [[Code]](https://github.com/usefulsensors/moonshine)
15
 
16
  This is the model card for running the automatic speech recognition (ASR) models (Moonshine models) trained and released by Moonshine AI (f.k.a Useful Sensors.)
17
 
 
59
 
60
  ## Model Details
61
 
62
+ This Moonshine model is trained for the speech recognition task, capable of transcribing Ukranian speech audio into Ukrainian text. Moonshine AI developed the models to support their business direction of developing real time speech transcription products based on low cost hardware. The following table shows comparisons of common ASR evaluations sets. For more information about evaluation, please refer to the paper.
63
 
64
  | Size | Parameters | Fleurs (WER) ↓ | Common Voice 17 (WER) ↓ |
65
  |:----:|:----------:|:------------------:|:------------------:|
66
+ | whisper tiny | 39 M | 63.83 | 67.07 |
67
+ | whisper medium | 769 M | 11.62 | 20.9 |
68
+ | moonshine tiny | 27 M | 18.25 | 26.11 |
69
 
70
  ### Release date
71
 
 
95
 
96
  However, like any machine learning model, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself.
97
 
98
+ In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. It is likely that this behavior and hallucinations may be worse for short audio segments, or segments where parts of words are cut off at the beginning or at the end of the segment.
99
 
100
  ## Broader Implications
101
 
 
116
  primaryClass={cs.CL},
117
  url={https://arxiv.org/abs/2509.02523},
118
  }
119
+ ```