mistralai
/

Voxtral-Mini-3B-2507

@@ -1,5 +1,4 @@
 ---
-library_name: mistral-common
 language:
 - en
 - fr
@@ -10,18 +9,20 @@ language:
 - nl
 - hi
 license: apache-2.0
 inference: false
 extra_gated_description: >-
   If you want to learn more about how we process your personal data, please read
   our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
 tags:
-- vllm
 ---
 # Voxtral Mini 1.0 (3B) - 2507
 Voxtral Mini is an enhancement of [Ministral 3B](https://mistral.ai/news/ministraux), incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.
-Learn more about Voxtral in our blog post [here](https://mistral.ai/news/voxtral) and our [research paper](https://arxiv.org/abs/2507.13264).
 ## Key Features
@@ -63,10 +64,10 @@ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
 #### Installation
-Make sure to install vllm >= 0.10.0, we recommend using `uv`:
 ```
-uv pip install -U "vllm[audio]" --system
 ```
 Doing so should automatically install [`mistral_common >= 1.8.1`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.1).
@@ -241,11 +242,11 @@ print(response)
 ### Transformers 🤗
-Starting with `transformers >= 4.54.0` and above, you can run Voxtral natively!
-Install Transformers:
 ```bash
-pip install -U transformers
 ```
 Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:
@@ -511,7 +512,7 @@ repo_id = "mistralai/Voxtral-Mini-3B-2507"
 processor = AutoProcessor.from_pretrained(repo_id)
 model = VoxtralForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.bfloat16, device_map=device)
-inputs = processor.apply_transcription_request(language="en", audio="https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3", model_id=repo_id)
 inputs = inputs.to(device, dtype=torch.bfloat16)
 outputs = model.generate(**inputs, max_new_tokens=500)

 ---
 language:
 - en
 - fr
 - nl
 - hi
 license: apache-2.0
+library_name: vllm
 inference: false
 extra_gated_description: >-
   If you want to learn more about how we process your personal data, please read
   our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
+pipeline_tag: audio-text-to-text
 tags:
+- transformers
 ---
 # Voxtral Mini 1.0 (3B) - 2507
 Voxtral Mini is an enhancement of [Ministral 3B](https://mistral.ai/news/ministraux), incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.
+Learn more about Voxtral in our blog post [here](https://mistral.ai/news/voxtral).
 ## Key Features
 #### Installation
+Make sure to install vllm from "main", we recommend using `uv`:
 ```
+uv pip install -U "vllm[audio]" --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
 ```
 Doing so should automatically install [`mistral_common >= 1.8.1`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.1).
 ### Transformers 🤗
+Voxtral is supported in Transformers natively!
+Install Transformers from source:
 ```bash
+pip install git+https://github.com/huggingface/transformers
 ```
 Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:
 processor = AutoProcessor.from_pretrained(repo_id)
 model = VoxtralForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.bfloat16, device_map=device)
+inputs = processor.apply_transcrition_request(language="en", audio="https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3", model_id=repo_id)
 inputs = inputs.to(device, dtype=torch.bfloat16)
 outputs = model.generate(**inputs, max_new_tokens=500)