nielsr HF Staff commited on
Commit
753955b
·
verified ·
1 Parent(s): cab9fa6

Improve model card: Add paper abstract for context

Browse files

This PR improves the model card by adding the paper's abstract to the model description section. This provides more comprehensive context regarding the Voxlect benchmark and the model's role within it. The existing arXiv paper link and GitHub repository link are retained as they are already accurate. No project page link is added as none was provided in the context.

Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -3,6 +3,7 @@ base_model:
3
  - facebook/mms-lid-256
4
  language:
5
  - fr
 
6
  license: openrail
7
  metrics:
8
  - accuracy
@@ -11,17 +12,18 @@ tags:
11
  - model_hub_mixin
12
  - pytorch_model_hub_mixin
13
  - speaker_dialect_classification
14
- library_name: transformers
15
  ---
16
 
17
  # MMS-LID-256 for French Dialect Classification
18
 
19
  # Model Description
20
- This model includes the implementation of French dialect classification described in <a href="https://arxiv.org/abs/2508.01691"><strong>**Voxlect: A Speech Foundation Model Benchmark for Modeling Dialect and Regional Languages Around the Globe**</strong></a>
 
 
21
 
22
  Github repository: https://github.com/tiantiaf0627/voxlect
23
 
24
- The included French dialects are with speakers from:
25
  ```
26
  [
27
  "Africa",
@@ -35,7 +37,7 @@ The included French dialects are with speakers from:
35
 
36
  ## Download repo
37
  ```bash
38
- git clone git@github.com:tiantiaf0627/voxvoxlect
39
  ```
40
  ## Install the package
41
  ```bash
 
3
  - facebook/mms-lid-256
4
  language:
5
  - fr
6
+ library_name: transformers
7
  license: openrail
8
  metrics:
9
  - accuracy
 
12
  - model_hub_mixin
13
  - pytorch_model_hub_mixin
14
  - speaker_dialect_classification
 
15
  ---
16
 
17
  # MMS-LID-256 for French Dialect Classification
18
 
19
  # Model Description
20
+ This model includes the implementation of French dialect classification described in [**Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe**](https://arxiv.org/abs/2508.01691).
21
+
22
+ **Abstract:** We present Voxlect, a novel benchmark for modeling dialects and regional languages worldwide using speech foundation models. Specifically, we report comprehensive benchmark evaluations on dialects and regional language varieties in English, Arabic, Mandarin and Cantonese, Tibetan, Indic languages, Thai, Spanish, French, German, Brazilian Portuguese, and Italian. Our study used over 2 million training utterances from 30 publicly available speech corpora that are provided with dialectal information. We evaluate the performance of several widely used speech foundation models in classifying speech dialects. We assess the robustness of the dialectal models under noisy conditions and present an error analysis that highlights modeling results aligned with geographic continuity. In addition to benchmarking dialect classification, we demonstrate several downstream applications enabled by Voxlect. Specifically, we show that Voxlect can be applied to augment existing speech recognition datasets with dialect information, enabling a more detailed analysis of ASR performance across dialectal variations. Voxlect is also used as a tool to evaluate the performance of speech generation systems. Voxlect is publicly available with the license of the RAIL family at: this https URL .
23
 
24
  Github repository: https://github.com/tiantiaf0627/voxlect
25
 
26
+ The included French dialects are with speakers from:
27
  ```
28
  [
29
  "Africa",
 
37
 
38
  ## Download repo
39
  ```bash
40
+ git clone git@github.com:tiantiaf0627/voxlect
41
  ```
42
  ## Install the package
43
  ```bash