Dispatch from the uncanny valley

Sesame Research has released a demo of their new conversational speech model. It is indeed uncanny.

They identify the challenge with voice AI interfaces:

To create AI companions that feel genuinely interactive [...] it must understand and adapt to context in real time.

And go on to fix the problem:

To address this, we introduce the Conversational Speech Model (CSM), which frames the problem as an end-to-end multimodal learning task using transformers. It leverages the history of the conversation to produce more natural and coherent speech.

The result is an AI voice that you can almost have a real conversation with.