Sorry for the late reply, I’ve finally had time to debug your case.
I don’t know if you’re familiar with the NLU parsing flow, but basically the parsing happens in 2 steps:
- we first run a
DeterministicIntentParser on your query, this parser is regex based and is supposed to parse all utterances that match a pattern seen in your input dataset (with the current implementation it rather parses “most utterances”)
- if this parser fails we run a second parser that leverage machine learning to generalize to unseen queries
In your case we have a double failure:
- the queries you type is a patterns known from the dataset:
"joue moi de la musique de <snips/musicArtist>", but the deterministic intent parser fails
- the probabilistic intent parser also fails
Why does the deterministic parser fail ?
With the current implementation at training time we see the following sentence pattern:
"joue moi de la musique de <snips/musicArtist>" that we extract from your labelling in the console, we thus create a regex to match this patterns.
At inference time we get
"joue moi de la musique de david bowie", the first thing the parser does it to extract entities in the utterance and replace the entities with placeholders in order to match the patterns.
However in our case
"moi" matches an artist called
"la musique" matches an artist named
"La musique populaire" (we allow partial matches) and
"david bowie" matches
"David Bowie", so we end up with the following pattern:
"joue <snips/musicArtist> de <snips/musicArtist> de <snips/musicArtist>" which was not created as train time (at train time we followed your labelling and created
"joue moi de la musique de <snips/musicArtist>").
So we fail here, that’s embarrassing but this should not be a problem since we get a second change with the ML-powered parser !
Now, why does the deterministic parser fail ?
Looking at the logs, I saw that the algorithm learnt the following weights in for the classification of the setence:
"fastjack:onMusicPlayArtist" -> (ngram:builtinentityfeaturesnipsmusicartist, 4.61)
"fastjack:onTest" -> (ngram:builtinentityfeaturesnipsmusicartist, -4.50)
"fastjack:onMusicPlay" -> (ngram:builtinentityfeaturesnipsmusicartist, -4.28)
"None" -> (ngram:builtinentityfeaturesnipsmusicartist, -3.63)
"fastjack:onMusicPlay" -> (ngram:musiqu, 3.42)
"fastjack:onMusicPlay" -> (ngram:entityfeaturetestbad, 3.10)
which basically translates in:
I see a <snips/musicArtist> entity then I add 4.61 to the score of "fastjack:onMusicPlayArtist"
I see a <snips/musicArtist> entity then I remove 4.50 to the score of "fastjack:onTest"
I see a <snips/musicArtist> entity then I remove 4.28 to the "fastjack:onMusicPlay"
I see a <snips/musicArtist> entity then I remove 3.62 to the None intent
I see the word "musique" then I add 3.42 to the score of "fastjack:onMusicPlay"
I see a <test/bad> entity then I add 3.10 to the score of "fastjack:onMusicPlay"
The first 5 lines seem very reasonable however the last one is responsible for the classifier wrong decision. Why did the classifier learned such a rule ? If it see a <test/bad> entity then it should point to you
The problem is that I see that the word “musique” is listed in the values of your
test/bad entity, thus the classifier learn to give a strong weight towards the
onMusicPlay intent. That’s normal the word “musique” appears a lot in the
onMusicPlay intent and it’s tagged a
What are your options ?
1. Wait for the next release
I the next release we’ll roll out a new implementation of the determistic intent parser that won’t have the same problems has the current one, I’ve tested it on your data and the
"joue moi de la musique de david bowie" gets perfecly parsed (
1.0 score for
"musique" from the
I’m not sure of the role of this entity and intent but tagging this word that is very very common intents of the same assistant can a LOT of negative side effects
I Hope this helps !