sinhala-tokenizer / tokenizer_config.json
Thisen Ekanayake
tokenizer class updated
e337c70
{
"tokenizer_class": "SentencePieceTokenizer",
"model_type": "unigram",
"unk_token": "<unk>",
"pad_token": "<pad>",
"bos_token": "<s>",
"eos_token": "</s>"
}