m2m100_418M-pruned-fra-bre-32768

Pruned version of facebook/m2m100_418M for:

  • fra_src: 16384 tokens
  • bre_tgt: 16384 tokens

Total vocabulary: 32768 tokens

Usage

from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

model = M2M100ForConditionalGeneration.from_pretrained("bourdoiscatie/m2m100_418M-pruned-fra-bre-32768")
tokenizer = M2M100Tokenizer.from_pretrained("bourdoiscatie/m2m100_418M-pruned-fra-bre-32768")

# Example
text = "Hello, how are you?"
tokenizer.src_lang = "en"
encoded = tokenizer(text, return_tensors="pt")
generated = model.generate(**encoded, forced_bos_token_id=tokenizer.get_lang_id("fr"))
translation = tokenizer.batch_decode(generated, skip_special_tokens=True)[0]

Model Details

  • Base: facebook/m2m100_418M
  • Size: 386.3M params (79.8% of original)
  • Pruning: Vocabulary trimming
Downloads last month
55
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bourdoiscatie/m2m100_418M-pruned-fra-bre-32768

Finetuned
(126)
this model