Cnam-LMSSC
/

EBEN_noisy_throat_microphone

@@ -9,21 +9,21 @@ tags:
 datasets:
 - Cnam-LMSSC/vibravox
 model-index:
-- name: EBEN(M=?,P=?,Q=?)
   results:
   - task:
       type: speech-enhancement
       name: Bandwidth Extension
     dataset:
-      name: Vibravox["YOUR_MIC"]
       type: Cnam-LMSSC/vibravox
       args: fr
     metrics:
     - type: stoi
-      value: ???
       name: Test STOI, in-domain training
     - type: n-mos
-      value: ???
       name: Test Noresqa-MOS, in-domain training
 ---
@@ -34,10 +34,10 @@ model-index:
 # Model Card
 - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
-- **Model:** [EBEN(M=?,P=?,Q=?)](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008))
 - **Language:** French
 - **License:** MIT
-- **Training dataset:** `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox)
 - **Samplerate for usage:** 16kHz
 ## Overview
@@ -62,12 +62,12 @@ import torch, torchaudio
 from vibravox.torch_modules.dnn.eben_generator import EBENGenerator
 from datasets import load_dataset
-model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_YOUR_MIC")
-test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)
-audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.YOUR_MIC"]["array"])
 audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)
 cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz[None, None, :])
-enhanced_audio_16kHz, enhanced_audio_decomposed_4kHz = model(cut_audio_16kHz)
 ```

 datasets:
 - Cnam-LMSSC/vibravox
 model-index:
+- name: EBEN(M=4,P=2,Q=4)
   results:
   - task:
       type: speech-enhancement
       name: Bandwidth Extension
     dataset:
+      name: Vibravox["throat_microphone"]
       type: Cnam-LMSSC/vibravox
       args: fr
     metrics:
     - type: stoi
+      value: 0.819
       name: Test STOI, in-domain training
     - type: n-mos
+      value: 3.64
       name: Test Noresqa-MOS, in-domain training
 ---
 # Model Card
 - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
+- **Model:** [EBEN(M=4,P=2,Q=4)](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication in IEEE TASLP](https://ieeexplore.ieee.org/document/10244161) - [arXiv link](https://arxiv.org/abs/2303.10008))
 - **Language:** French
 - **License:** MIT
+- **Training dataset:** `speech_noisy` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) (see [VibraVox paper on arXiV](https://arxiv.org/abs/2407.11828))
 - **Samplerate for usage:** 16kHz
 ## Overview
 from vibravox.torch_modules.dnn.eben_generator import EBENGenerator
 from datasets import load_dataset
+model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_noisy_throat_microphone")
+test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_noisy", split="test", streaming=True)
+audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.throat_microphone"]["array"])
 audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)
 cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz[None, None, :])
+enhanced_audio_16kHz, enhanced_speech_decomposed = model(cut_audio_16kHz)
 ```