Special characters issues

#7
by mathdons - opened

Looks like there's an issue with this model, where some characters (Ç, È for instance) are output as < unk > tokens.

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
print(pipe("That hurts!")) # [{'translation_text': 'Ça fait mal !'}]
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-fr")
print(pipe("That hurts!")) # [{'translation_text': 'a fait mal !'}]

Sign up or log in to comment