Generate speech using reference audio and text
Convert audio to text with context and language options