Language (Text)
Language the input text is in.
Language (Output)
Target language/accent to output.
Task
Text Delimiter
How to split the text into utterances.
Use raw text rather than phonemize the text as the input prompt.
Auto play on generation (using sounddevice).