Scaling Omni LLMs to Personalized Long-Horizon Speech
Multimodal Instruction-based Editing and Generation