Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? Paper • 2502.11895 • Published Feb 17, 2025 • 3
What makes a language easy to deep-learn? Deep neural networks and humans similarly benefit from compositional structure Paper • 2302.12239 • Published Feb 23, 2023 • 1
Dynaword: From One-shot to Continuously Developed Datasets Paper • 2508.02271 • Published Aug 4, 2025 • 15
GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding Paper • 2311.09707 • Published Nov 16, 2023