test
#17
by
ravivarmai - opened
README.md
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: mistral-common
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
- fr
|
|
@@ -10,18 +9,20 @@ language:
|
|
| 10 |
- nl
|
| 11 |
- hi
|
| 12 |
license: apache-2.0
|
|
|
|
| 13 |
inference: false
|
| 14 |
extra_gated_description: >-
|
| 15 |
If you want to learn more about how we process your personal data, please read
|
| 16 |
our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
|
|
|
|
| 17 |
tags:
|
| 18 |
-
-
|
| 19 |
---
|
| 20 |
# Voxtral Mini 1.0 (3B) - 2507
|
| 21 |
|
| 22 |
Voxtral Mini is an enhancement of [Ministral 3B](https://mistral.ai/news/ministraux), incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.
|
| 23 |
|
| 24 |
-
Learn more about Voxtral in our blog post [here](https://mistral.ai/news/voxtral)
|
| 25 |
|
| 26 |
## Key Features
|
| 27 |
|
|
@@ -63,10 +64,10 @@ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
|
|
| 63 |
|
| 64 |
#### Installation
|
| 65 |
|
| 66 |
-
Make sure to install vllm
|
| 67 |
|
| 68 |
```
|
| 69 |
-
uv pip install -U "vllm[audio]" --
|
| 70 |
```
|
| 71 |
|
| 72 |
Doing so should automatically install [`mistral_common >= 1.8.1`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.1).
|
|
@@ -241,11 +242,11 @@ print(response)
|
|
| 241 |
|
| 242 |
### Transformers 🤗
|
| 243 |
|
| 244 |
-
|
| 245 |
|
| 246 |
-
Install Transformers:
|
| 247 |
```bash
|
| 248 |
-
pip install
|
| 249 |
```
|
| 250 |
|
| 251 |
Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:
|
|
@@ -511,7 +512,7 @@ repo_id = "mistralai/Voxtral-Mini-3B-2507"
|
|
| 511 |
processor = AutoProcessor.from_pretrained(repo_id)
|
| 512 |
model = VoxtralForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.bfloat16, device_map=device)
|
| 513 |
|
| 514 |
-
inputs = processor.
|
| 515 |
inputs = inputs.to(device, dtype=torch.bfloat16)
|
| 516 |
|
| 517 |
outputs = model.generate(**inputs, max_new_tokens=500)
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- fr
|
|
|
|
| 9 |
- nl
|
| 10 |
- hi
|
| 11 |
license: apache-2.0
|
| 12 |
+
library_name: vllm
|
| 13 |
inference: false
|
| 14 |
extra_gated_description: >-
|
| 15 |
If you want to learn more about how we process your personal data, please read
|
| 16 |
our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
|
| 17 |
+
pipeline_tag: audio-text-to-text
|
| 18 |
tags:
|
| 19 |
+
- transformers
|
| 20 |
---
|
| 21 |
# Voxtral Mini 1.0 (3B) - 2507
|
| 22 |
|
| 23 |
Voxtral Mini is an enhancement of [Ministral 3B](https://mistral.ai/news/ministraux), incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.
|
| 24 |
|
| 25 |
+
Learn more about Voxtral in our blog post [here](https://mistral.ai/news/voxtral).
|
| 26 |
|
| 27 |
## Key Features
|
| 28 |
|
|
|
|
| 64 |
|
| 65 |
#### Installation
|
| 66 |
|
| 67 |
+
Make sure to install vllm from "main", we recommend using `uv`:
|
| 68 |
|
| 69 |
```
|
| 70 |
+
uv pip install -U "vllm[audio]" --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
|
| 71 |
```
|
| 72 |
|
| 73 |
Doing so should automatically install [`mistral_common >= 1.8.1`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.1).
|
|
|
|
| 242 |
|
| 243 |
### Transformers 🤗
|
| 244 |
|
| 245 |
+
Voxtral is supported in Transformers natively!
|
| 246 |
|
| 247 |
+
Install Transformers from source:
|
| 248 |
```bash
|
| 249 |
+
pip install git+https://github.com/huggingface/transformers
|
| 250 |
```
|
| 251 |
|
| 252 |
Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:
|
|
|
|
| 512 |
processor = AutoProcessor.from_pretrained(repo_id)
|
| 513 |
model = VoxtralForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.bfloat16, device_map=device)
|
| 514 |
|
| 515 |
+
inputs = processor.apply_transcrition_request(language="en", audio="https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3", model_id=repo_id)
|
| 516 |
inputs = inputs.to(device, dtype=torch.bfloat16)
|
| 517 |
|
| 518 |
outputs = model.generate(**inputs, max_new_tokens=500)
|