zackli4ai commited on
Commit
5c1eb5b
·
verified ·
1 Parent(s): 4469fe8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Parakeet-TDT-0.6B v3 (ANE)
2
+
3
+ ## Model Description
4
+
5
+ **parakeet-tdt-0.6b-v3** is a 600M-parameter multilingual automatic speech recognition (ASR) model from **NVIDIA**.
6
+ It extends **parakeet-tdt-0.6b-v2** by moving beyond English-only to support **25 European languages** with automatic language detection.
7
+
8
+ The model was primarily trained on the **Granary multilingual corpus** \[1,2] and is optimized for both research exploration and production deployment.
9
+
10
+ This build is integrated with **nexaSDK** and optimized for modern **NPUs**, including **Apple’s Neural Engine (ANE)**, for efficient on-device inference.
11
+
12
+
13
+ ## Features
14
+
15
+ * **Multilingual ASR**: 25 European languages with built-in language detection.
16
+ * **Text formatting**: Outputs text with **punctuation and capitalization**.
17
+ * **Timestamps**: Provides both word-level and segment-level timestamps.
18
+ * **Long audio transcription**:
19
+
20
+ * Up to **24 minutes** with full attention (A100 80GB).
21
+ * Up to **3 hours** with local attention.
22
+ * **Optimized for NPUs**: Runs efficiently on **Apple ANE**, Qualcomm Hexagon, and other dedicated accelerators.
23
+ * **Commercial-friendly**: Released under **CC-BY-4.0 license**.
24
+
25
+
26
+ ## Apple Neural Engine (ANE)
27
+
28
+ The **Apple Neural Engine (ANE)** is a specialized NPU in Apple silicon designed to accelerate AI and ML workloads \[3].
29
+ By offloading heavy ASR computations to the ANE, **parakeet-tdt-0.6b-v3** achieves:
30
+
31
+ * **Lower latency** speech transcription on iPhone, iPad, and Mac.
32
+ * **Energy-efficient inference**, extending battery life during real-time ASR tasks.
33
+ * **On-device privacy**, keeping voice data local while maintaining production-grade accuracy.
34
+
35
+
36
+ ## Supported Languages
37
+
38
+ Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et),
39
+ Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Italian (it), Latvian (lv),
40
+ Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk),
41
+ Slovenian (sl), Spanish (es), Swedish (sv), Russian (ru), Ukrainian (uk)
42
+
43
+
44
+ ## Use Cases
45
+
46
+ * Conversational AI and multilingual chatbots
47
+ * Voice assistants and smart devices
48
+ * Real-time transcription services
49
+ * Subtitles and caption generation
50
+ * Voice analytics platforms
51
+ * Research in speech technology
52
+
53
+
54
+ ## Inputs and Outputs
55
+
56
+ **Input**
57
+
58
+ * **Type**: 16kHz audio
59
+ * **Formats**: `.wav`, `.mp3`
60
+ * **Shape**: 1D mono audio
61
+
62
+ **Output**
63
+
64
+ * **Type**: Text string
65
+ * **Properties**: Punctuation + capitalization included
66
+
67
+
68
+ ## Limitations & Responsible Use
69
+
70
+ The model may produce transcription errors, particularly with code-switching or noisy input.
71
+ Evaluate thoroughly before deploying in sensitive domains (e.g., healthcare, finance, or legal).
72
+
73
+
74
+ ## License
75
+
76
+ * Licensed under the original Parakeet license terms.
77
+ * See: [Parakeet Model License](https://huggingface.co/NexaAI/parakeet-tdt-0.6b-v3-NPU/blob/main/LICENSE)
78
+
79
+
80
+ ## References
81
+
82
+ * [Parakeet Project](https://huggingface.co/models?search=parakeet)
83
+ * [nexaSDK](https://sdk.nexa.ai)
84
+ * [Apple Neural Engine (ANE)](https://apple.fandom.com/wiki/Neural_Engine)
85
+
86
+
87
+ ## Support
88
+ * For Nexa SDK: [sdk.nexa.ai](https://sdk.nexa.ai)