Porcupine wakeword detection and Snips platform on same Raspberry Pi

Hi everyone,

I’m currently trying to use the Porcupine wakeword detection engine together with the Snips platform. I want both of them to run on the same Raspberry Pi, using the same microphone as audio input (ReSpeaker Mic Array v2). However, the audio server component of Snips seems to block the microphone input stream, such that Porcupine cannot access it simultaneously.

Is that an issue that can be solved with a proper alsa configuration (e.g. using dsnoop)? Or is the problem a deeper one?

Any idea on how I can make this work?

Thank you and all the best,
Lukas

1 Like

i just wrote my own hotword engine using porcupine that listens to the mqtt for all devices for all created hotword ppn’s

here is a sample (this is for macOS so the porcupine libs would need to be changed to match your os)

#!/usr/bin/env python2
# -*- coding:utf-8 -*-

### **************************************************************************** ###
# 
# Project: Hotword Engine for Snips
# Created Date: Friday, August 17th 2018, 6:59:52 pm
# Author: Greg
# -----
# Last Modified: Sun Feb 10 2019
# Modified By: Greg
# -----
# Copyright (c) 2018 Greg
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
# AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# 
### **************************************************************************** ###




import os
import sys
import time
from porcupine import Porcupine
import paho.mqtt.client as mqtt
import json
import struct

mqtt_client = mqtt.Client()
clientList = []


MQTT_ADDRESS = '10.0.1.100' 
MQTT_PORT = '1883'

library_path = os.path.join(os.path.dirname(__file__), 'libpv_porcupine.dylib')
model_file_path = os.path.join(os.path.dirname(__file__), 'porcupine_params.pv')
keyword_file_paths = [os.path.join(os.path.dirname(__file__), 'hey_janet_mac.ppn'),os.path.join(os.path.dirname(__file__), 'tv_play.ppn'),os.path.join(os.path.dirname(__file__), 'tv_pause.ppn')]
sensitivities = [0.3,0.3,0.3]

porcupine = None
        
porcupine = Porcupine(
    library_path=library_path,
    model_file_path=model_file_path,
    keyword_file_paths=keyword_file_paths,
    sensitivities=sensitivities)


def on_connect(client, userdata, flags, rc):
    mqtt_client.subscribe('hermes/audioServer/+/audioFrame')
    mqtt_client.subscribe('hermes/hotword/toggleOff')
    mqtt_client.subscribe('hermes/hotword/toggleOn')


def on_message(client, userdata, msg):

    if msg.topic == 'hermes/hotword/toggleOff':
        msgJSON = json.loads(msg.payload)
        if msgJSON['siteId'] not in clientList:
            #print('{} wakeword disabled'.format(msgJSON['siteId']))
            clientList.append(msgJSON['siteId'])
    elif msg.topic == 'hermes/hotword/toggleOn':
        msgJSON = json.loads(msg.payload)
        if msgJSON['siteId'] in clientList:
            #print('{} wakeword enabled'.format(msgJSON['siteId']))
            clientList.remove(msgJSON['siteId'])
    else:
        siteId = msg.topic.split('/')[2]

        pcm = struct.unpack_from("h" * porcupine.frame_length, msg.payload[60:])
  
        result = porcupine.process(pcm)        
  
        if (result == 0) and (siteId not in clientList):
            milli_sec = int(round(time.time() * 1000)-100)
            #print('{} detected keyword at time {}'.format(siteId, milli_sec))
    
            mqtt_client.publish('hermes/hotword/default/detected',json.dumps({
                'siteId': siteId,
                'modelId': 'hey_janet',
                'modelId': 'hey_janet_mac_1-.5',
                'modelType': 'universal',
                'currentSensitivity': 0.3,
                'detectionSignalMs':milli_sec,
                'endSignalMs':milli_sec
            }))   
        elif result > 0:
            #1 = play
            #2 = pause
            mqtt_client.publish('chromecast/function/{}'.format(result)) 


if __name__ == '__main__':
    mqtt_client.on_connect = on_connect
    mqtt_client.on_message = on_message
    mqtt_client.connect(MQTT_ADDRESS, int(MQTT_PORT))
    mqtt_client.loop_forever()
2 Likes

Thank you for your answer, that looks pretty awesome! I can probably adapt that for my purposes.

I’m not really familiar with processing audio signals and how the Snips audio server works in detail, so just to be sure that I understand everything correctly:

That looks like the audio input signal of a site is broadcasted via MQTT by the Snips audio server, is that correct? And now you feed each audio frame that you receive into Porcupine. Is Porcupine only evaluating this single frame or does it also consider past frames?

Yes correct

That code only works for one site device… it’s old code
I created a threaded class of porcupine objects for each satellite mqtt stream and pass audio samples to each object depending on the mqtt that arrives

Hi,

why is this wake word engine better than the snips build in?

Thanks

That code only works for one site device…

Ah okay, now it makes perfect sense. In my case one site is enough anyways. Thank you for sharing!

why is this wake word engine better than the snips build in?

I think it depends on what you are looking for. Porcupine allows you to choose an (almost) arbitrary global wake word and the recognition rate is great. Also, you don’t have to do any recording.

I didn’t want to use one of the built-in global wake-words of the Snips platform because they are not suitable for my use case. I also tried to record a custom wake word, but that didn’t work too well. I also had a lot of trouble with false positives with both global and custom wake words. Porcupine worked much better in that regard.

They improved the Hey Snips wake word in the new release, so maybe that’s interesting for you. I haven’t evaluated it yet though.

1 Like

@l_b

one thing… since porcupine uses a 512 frame rate, and snips by default uses 256, don’t forget to change the snips.toml file to this

[snips-audio-server]
frame = 512

everything will be happy then :slight_smile:

1 Like

For some reasons I ended up implementing it in C#. For anyone interested, here is the code:

private void OnAudioFrameReceived(byte[] audioFrame) {
    if (!_hotwordOn) {
        return;
    }

    short[] pcm = new short[512];
    Buffer.BlockCopy(audioFrame, 60, pcm, 0, audioFrame.Length - 60);

    bool result;

    PicoVoiceStatus status = _porcupine.Process(pcm, out result);

    if (status != PicoVoiceStatus.SUCCESS) {
        Util.Log("Hotword Engine",  "Error while processing audio frame.");
    }

    if (result) {
        Util.Log("Hotword Engine", "Hotword detected.", ConsoleColor.Magenta);
        Messenger.instance().publish("hermes/hotword/default/detected", "{\"siteId\": \"default\", \"modelId\": \"default\"}");
    }
}

In general this works really well and is exactly what I was looking for. However, sometimes it mysteriously stops working, even though there is no apparent reason for it. Frames are still being received from the audio server. Any ideas?

you did set this too??

[snips-audio-server]
frame = 512

else the audio frames will only be 256 and not a full 512 buffer with data

Sure did. As I said, it works perfectly for a while, then it doesn’t. Maybe there is an issue with Porcupine running on windows. I’ll do some more testing and report back.

Using WiFi? My mqtt code?

There was a discussion a little while ago about satellite snips devices stopping and turned out to be WiFi disconnecting causing users to think snips stops… just thinking out loud, and thought this might be something similar

I’m using a wired connection, so it can’t be a WiFi problem. However, I tried to make it more thread-safe by using a mutex. I haven’t had any problems since:

    static readonly object _locker = new object();
    private void OnAudioFrameReceived(byte[] audioFrame) {
        if (!_hotwordOn) {
            return;
        }

        short[] pcm = new short[512];
        Buffer.BlockCopy(audioFrame, 60, pcm, 0, audioFrame.Length - 60);

        bool result;
        PicoVoiceStatus status;

        lock (_locker) {
            status = _porcupine.Process(pcm, out result);
        }

        if (status != PicoVoiceStatus.SUCCESS) {
            Util.Log("Hotword Engine", "Error while processing audio frame.");
        }

        if (result) {
            Util.Log("Hotword Engine", "Hotword detected.", ConsoleColor.Magenta);
            Messenger.instance().publish("hermes/hotword/default/detected", "{\"siteId\": \"default\", \"modelId\": \"default\"}");
        }
    }

This seems to prevent audio frames from getting mixed up. I’m not sure though if this is the best way to do it…