Use ASR to transcribe speech


I do sometimes use the Snips NLU for intent parsing on its own without a session, and it works like a charm. I have been trying to post some wav file to the ASR to use it for simple audio to text transcription. The ASR gets the data, but reports “” every time. Is it at all possible or I am just wasting my time? @Val pls any response will be greatly appreciated

I’m encountering this problem as well, the wake word is working and the wav is submitted to ASR but it’s not parsing text.

Per this GitHub issue it sounds like support for Google ASR is not currently reliable

Edit: Per this forum thread last updated 24 days ago Google ASR is broken on the latest and not a priority for the team [SOLVED] Google ASR does not work properly after update to 0.61.1

Kind of makes sense, I’d rather the on device recognition be improved than need to send everything I say to google’s servers

The difference here is that you send what you choose, it’s not always listening…


Thanks for the response. But in my case, I don’t want to use the wake word. Snips works normally when talking via a site’s audio server with attached microphone, but I want to be able to send a stored wav file and have the speech in it transcribed to text, so I could use it for other things.

@Psycho glad to see you on this. Pls have you attempted anything like this? I believe I am sending the file over MQTT, but it seems nothing happening. I even toggle it to start listening which it responded it does start, and after a while (depending on the duration of the file) , toggle it off.

Initially when I only toggle it one and off without putting into consideration the duration, it didn’t even give me anything. After I started using the duration, began to give me “”.

Any help will be nice, even if it’s I am wasting my time; so I could focus on another possible solution.

Kind regards

@Psycho Excellent point. I was just reading another thread about someone’s beer recognition app and that was the perfect use case for using a remote ASR - recognize almost everything on device, and send the beer name itself up to the cloud for parsing. I’m much too new to this community to be making statements of opinion like the above!

@odia as far as I can tell from my reading Google ASR has problems right now with the latest versions of Snips. I’ll leave it to more experienced folks to help further, maybe an older version would work. Per Psycho’s comment in the thread I linked, looks like there’s no way to downgrade anymore though.

Maybe the resources previously working on niche voice assistant features like recognizing speech have shifted focus to mainline engineering tasks like cryptocurrency mining.

Thanks for this @mustacheemperor, but I still need to use the local ASR for this.

@Val or @fredszaq pls can anyone at least let me know what is possible? Or if it not possible so I can rest my case and focus on something else.

@fredszaq I had looked into possibly using this, and noticed I think you commented on it’s incompatibility with Snips. Pls I just need someone to tell me if it’s possible or not.

Kind regards

I have the same problem. And now to convert speech to text I use this program here