Record a voice clip?

Is there a way to create a snips command to record a voice clip?
“ok snips, record a voice clip”
then have it create a wav file up until there is a period of silence?

Maybe, this requires managing the mqtt commands to turn on and off the components?

What you can do:

  • Create an intent, to start recording
  • Create a personal hotword, saying “Stop the recording”
  • When the intent is detected, record
  • When the personal hotword is detected stop recording

You can even subscribe to the audio frames on snips if you wish, even easier. It’s a matter of packing them in an array to create a wav file

thanks. I’ll look into how to manage the MQTT messages in order to start recording when the intent is recognized then hold off everything until done, then respond to the intent. I may be back:)

Hi,
I pretty much done this last friday. Here is the code for recording from mqtt

Hi Basneu - Thanks for posting the code sample to capture audio from the Snip.ai Audio Frames and write to a .wav file. I have it working reasonably well, using ASR startListening and stopListening messages to control recording, but the .wav captured is pretty noisy in areas it should be silent. Can you please point me at the documentation you used in understanding the Audio Frame format? I can’t find any anywhere and would like to better understand what these frames contain. - Thanks - BobH

Hi Bob,

I can have a look at it on Monday. But I think I googled for information about wavelets and used that to build up the files.

Hey Basneu - The word “wavelets” was a great clue and I just tracked down this document which looks to explain much of what I need. The snips.ai mqtt audioframes I’m processing appear the have the characters “time” not “data” in bytes 36-39. Ignoring that, and processing the sound part anyway, produces a noisy, but playable/understandable output file. I’ll hex-dump the audioframe and investigate further. Thanks - BobH

Hey Basneu - OK, I have it working now. Inspecting the Snip AudioFrames revealed that the “data” subchunk is at byte offset 52 in my AudioFrames, not 36 as in your code example. (At byte offset 36 the word “time” appears followed by 12 bytes of something!) If I test for “data” at byte offset 52 and process accordingly I get a really clear output recording. Thanks for the steer on this, it saved me a lot of time. Thanks - BobH

So it might have been changed since my Code. Thanks for letting me know.