Generate from CSM 1B (Conversational Speech Model). Code is available on GitHub: SesameAILabs/csm. Checkpoint is hosted on HuggingFace.
Try out our interactive demo sesame.com/voicedemo, this uses a fine-tuned variant of CSM.
The model has some capacity for non-English languages due to data contamination in the training data, but it is likely not to perform well.
Each line is an utterance in the conversation to generate. Speakers alternate between A and B, starting with speaker A.
GPU time limited to 3 minutes, for longer usage duplicate the space.