Special characters issues
#7
by
mathdons
- opened
Looks like there's an issue with this model, where some characters (Ç, È for instance) are output as < unk > tokens.
from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
print(pipe("That hurts!")) # [{'translation_text': 'Ça fait mal !'}]
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-fr")
print(pipe("That hurts!")) # [{'translation_text': 'a fait mal !'}]