view reply Really cool post! In particular this was eye-opening to me: However, I would consider both Unicode and UTF-8 to be tokenizers.