Better TTS for snips home

Lot of categories here but no one about TTS.

And know what ? Apart ourselves, who spend time installing, developing, debugging and talking to snips, all other house (business ?) member mainly know snips for her voice :roll_eyes:

And actually, it’s nice to get picotts voice, because it’s free … But it is seriously lacking.

Voice is really monotonous, if not robotic. In french, a lot of english words we use everyday are just unusable… Try to send “Bon week-end”, you will avoid that …, etc …
For example, I’m listening music on a kodi device at home. I use snips to ask her to play jazz, pop etc. I have the code to ask her who is playing which song, but with a lot of english artists / song titles, I just disable this skill … :cry:

I think that to stay reliable and fast, we have to stay cloudless (full local tts). I don’t know exactly which other solution exist (I tried a few online, most were better).

But just so you know it, I’m ready to pay to buy a really usable, nice voice for my home. :wink:

I use quiet a lot TTS, for things to not forget at home, and on morning / evening connected to a netatmo welcome who recognizes us and can talk to us personally. A good voice would open a lot more use of TTS.

I know Snips team is aware of this and are looking for solution, just wanted to put a big vote on this side of snips :grin:


Hi @KiboOst as you know we are looking into this :slight_smile:


I currently develop on a offline-online TTS solution using AWS Polly. The aim is separate the text in its words and numbers and I’ll cache the samples. Whenever there is a a cache-hit, then I use the offline wav file and whenever it is a cache-miss, I request it from Polly. The more complicated thing is the speed of when I need to start which file. Maybe this approach could help you out?

1 Like

FWIW, I have a guide here for setting up TTS using amazon polly and also using home assistant’s TTS (various providers).

I realize this is not offline. It’s very unfortunate that picotts supports better voices but only on the android platform.

Yah thats what I do aswell.
I mostly have the same answers so I just need to calculate it once with aws polly and then use the locally cached sample further on

personally I would avoid all cloud based stuff for everyday home use… :stuck_out_tongue_winking_eye:

I am absolutely with you I kind of hate to use amazon. I currently accept it since I only need to calculate it once and then it is offline (and picotts just sounds aweful), but the moment there is something acceptable avaible for offline use I would absolutely throw amazon out of my setup :sweat_smile:

Offline wavenet TTS !!
Example at the end, :flushed:


Thanks for sharing did not see that yet and it´s definetly a much better choice than picotts while it´s not online :heart_eyes:

I doubt offline wavenet is available yet. And on a raspberry Pi, ever more.

Yeah but it´s good to see that they work on it. I do not need it tomorrow, I just like to know that there will be something :wink:

1 Like

Perhaps SnipsSuperTTS is what you’re looking for?

1 Like

Hey Guys,

did you consider using MaryTTS ? (

They support multiple languages and have multiple voices per language. Some of them are quite good in my opinion (quite natural language), other less so. (I tested only English and German though).

Here you can choose and test their voices with simple texts:
Put in your text and choose voice and language under “voices”.

You can run their TTS server locally without any internet connection neccessary. I used it a while ago, when I was trying out the Jasper voice recognition platform.

Haven’t tested it with Snips yet, but I will try to setup a MaryTTS docker container and then try to connect SNips to it (haven’t looked into how to do that yet, though).



1 Like

I’ve been playing around with the voices from Cepstral ( as they have voices for Raspberry Pi and only cost ~$30-35, and they sound decent. The purchase process is a little weird as you have to “apply” for an account to the store before you can buy, and that application needs to be manually approved, though at least for me it was reasonably fast.

They also have Linux voices, but be aware that by default the license for Linux (i.e. non. RPi) version doesn’t allow you to save files to disk - you have to request/buy that separately.

Couldn’t a local version of Tacotron be used? Mycroft was working on “Mimic 2”, which was based on Tacotron, for example.

It does take a second or two to generate the response, but with the new Raspberry Pi 4B that will probably be cut down to half that.

I’ve installed this version of Tacotron on a Raspberry Pi 4B with 4Gb ram.

It comes with a webserver script where you can enter sentences, and then the resulting .wav file will be played in the browser.

The voice it generates sounds good, but not perfect. Even then, the current reality is that it takes at least 10 seconds for the audio response to be generated. That’s a lot longer than I had hoped.

I suspect there’s a lot of opportunity for optimization, but I don’t know how to do that. so for now the answer to my own question is: it seems it’s still too processing intensive.

@KiboOst have a look at NanoTTS. It’s supposed to be an improved version of PicoTTS.

What a pity. I am also following this, and looking for better TTS. Unfortunately, I also need german TTS, so the possibilities are even more limited.

I’ve heard examples of mimic2/tacotron, and they sounded good. Yet it wont be of any use if it cannot be run on a raspberry pi :frowning: