To Snipsters: Is there a difference between higher (more female) and lower (more male) voices in recognition?

#1

Hi there,

I have a functional assistant. For me it is no problem that Snips recognizes what I want. But when my daughters (4 and 6) speak with Snips, it does not recognize anything (!) or only a part which leads to an error. It is the same behaviour with my PI 3b+ with PS3 Eye and in the console. So it seems that it is no hardware problem. We are talking to Snips in German.

Could it be that the dataset for the model is more familiar with a (more) male voice?
What can I do to improve that behaviour?

Thanks in advance!
freddy

#2

You can let your daughters record a personal wakeword, which is tuned to their voice.

#3

Hi @koan,

the wakeword is not the problem. That is the functional part. But the part after waking up with “Hey Snips” is not recognized very well. We speak exact the same sentences, nearly the same rhythm and intonation. Snips recognizes me very well, but my daughters not.

Bye
freddy

Edit: And there is no difference between a hardware system (RPI) and software (Console on MacBook with browser).

#4

Oh right, I misunderstood. That’s weird. Well then I don’t think you can do much about it. It could be that Snips ASR is trained more on a dataset of adult voices.

#5

Children’s voices are a well known and well documented problem with ASR systems having trouble understanding

Just have a quick google for “children asr” and see

#6

Hey @ozie,

thanks for your response! I see the problem.

The quesion remains: Can I do anything to improve? Another question is, if SNIPS is aware of this problem?

I know the comparison is not fair (because online/offline), but other assistants like Alexa have no problem with that. The half of the residents are kids in our house. :smile: Because I try to get rid of all remote controls it would be cool if our kids are able to switch on TV.

Happy Easter!
freddy

#7

See how cool it is when you are watching something and they want to watch something else and can just ask for it

1 Like
#8

That is why we also need person identification through voice. So dat you can work with acl. I have no problem that my kids can turn the lights in their rooms on and off. Just not the lights in my bed room.

1 Like
#9

:rofl: Yes, that is another point.

#10

That would be great!

#11

there is already personal identification with the wake word, and has been for some time now

#12

As long as wake words react on sneezes from any give family member, I will not see that as a viable way to run acls. Especially as after a wake word it is no longer detectable who is speaking.

2 Likes
#13

Ping. Nobody from SNIPS? @valf @fredszaq

#14

Hi freddy,

As @koan pointed out, the data we use to train the acoustic model mainly consists of adult voices. As a result, the models we develop are probably much better for adult voices. As @ozie mentioned, children ASR is a problem in itself that we haven’t yet specifically addressed.

I am afraid there is not much you can do to improve the behaviour for now. Thanks for pointing it out though, it is nice to have that kind of feedback to improve our systems.

Best,
Théodore

1 Like
#15

I have experienced a similar behavior of snips.

A while back I read online that there was a startup that hat created a voice assistant specifically designed to address children. That underlines the issue that snips would have to specifically address this.

1 Like