Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs Paper • 2510.20475 • Published 30 days ago • 1
EXECUTE: A Multilingual Benchmark for LLM Token Understanding Paper • 2505.17784 • Published May 23
Subword-Delimited Downsampling for Better Character-Level Translation Paper • 2212.01304 • Published Dec 2, 2022
German4All Collection A collection of datasets and models for paraphrasing German texts to different complexity levels. • 4 items • Updated Aug 29
RoD-TAL: A Benchmark for Answering Questions in Romanian Driving License Exams Paper • 2507.19666 • Published Jul 25
Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models Paper • 2206.02252 • Published Jun 5, 2022
Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification Paper • 2311.13937 • Published Nov 23, 2023 • 1
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages Paper • 2502.11926 • Published Feb 17 • 2
EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian Paper • 2505.23297 • Published May 29 • 1