Incorrect intent recognition

Hi!

I have created 2 intents that overlaps on some words :

onMusicPlay:
  - joue moi de la musique

onMusicPlayArtist:
  - joue moi de la musique de <snips/musicArtist>

When I request “joue moi de la musique de David Bowie”, I get the onMusicPlay intent which is not correct.

It’s seems that the NLU engine ignore the presence of the “…de <snips/musicArtist>” at the end of the utterance.

Is it due to the fact that the words “de la musique” are more present in the onMusicPlay intent’s training phrases?

Is the word “de” a stopword that gets ignored?

Thanks for your help :slight_smile:

how big and varied are the two intent training data?
do you have David Bowie in the slot data?? or have slot data extensibility enabled??

Both intent have 98 varied training phrases.

I did not put David Bowie in the utterances.

The onMusicPlay intent have “de la musique” in almost all of them whereas the onPlayArtist have more variations (de la musique de…, des chansons de…, des morceaux de…, etc.).

Seems that the “de la musique” term is more prominant in the onMusicPlay intent which cause this intent to be the detected although it does not match the end of the utterance (de la musique DE <…>). Is the word “de” marked as a stop word somewhere?

Maybe I can test with more variations in the intent onMusicPlay (joue moi des chansons, joue moi des morceaux de musique, etc.) to decrease the “de la musique” weight in the overall intent.

I’ll test it and report back.

Thanks @ozi for your help :wink:

one simple solution… just use ONE intent…
have loads of training samples… some with a slot and others without

then in the code behind that does al the work, just check if there is a slot returned or not… if not slot then it doesn’t include an artist name… if there is a slot then pull the artist name info from it

simple :slight_smile:

Thanks @ozie
It is indeed a possible solution. I will test it later today.
I would still like to know why the ASR is correct but the NLU does not match the correct intent though.
Word weight? Stopwords?
If anyone from Snips can shed some light on the matter it would be brilliant :slight_smile: ?

Hi @fastjack,
@ozie solutions seems reasonable, however your use case should be handle correctly.
It’s a bit hard to understand what’s going on without more details/data, do you allow me to access your assistant’s data so that I can debug ?
Clément

Hey @ClemDoum !

Please take a look at my assistant.

My assistant is proj_XaK3Kv3E4vQ
The onMusicPlay/onMusicPlayArtist intents works flawlessly if I remove the Playback app from the assistant.

I created an alternative assistant with a minimum test case: proj_8A11KlKGdxw
If you remove the custom test/bad slot (from the first app), everything works (“joue moi de la musique de david bowie” -> onMusicPlayArtist).
With this slot I get onMusicPlay almost everytime.

Looks like a strange slot / slot value / slot synonym / injected value / utterance weight issue.

Hope you can identify the cause of this strange behavior (can this be due to the latest release?).
Tell me if you need more info.

Thanks for your help.

Hi @fastjack,
Sorry for the late reply, I’ve finally had time to debug your case.

I don’t know if you’re familiar with the NLU parsing flow, but basically the parsing happens in 2 steps:

  1. we first run a DeterministicIntentParser on your query, this parser is regex based and is supposed to parse all utterances that match a pattern seen in your input dataset (with the current implementation it rather parses “most utterances”)
  2. if this parser fails we run a second parser that leverage machine learning to generalize to unseen queries

In your case we have a double failure:

  1. the queries you type is a patterns known from the dataset: "joue moi de la musique de <snips/musicArtist>", but the deterministic intent parser fails
  2. the probabilistic intent parser also fails

Why does the deterministic parser fail ?

With the current implementation at training time we see the following sentence pattern: "joue moi de la musique de <snips/musicArtist>" that we extract from your labelling in the console, we thus create a regex to match this patterns.
At inference time we get "joue moi de la musique de david bowie", the first thing the parser does it to extract entities in the utterance and replace the entities with placeholders in order to match the patterns.

However in our case "moi" matches an artist called "Moi", "la musique" matches an artist named "La musique populaire" (we allow partial matches) and "david bowie" matches "David Bowie", so we end up with the following pattern: "joue <snips/musicArtist> de <snips/musicArtist> de <snips/musicArtist>" which was not created as train time (at train time we followed your labelling and created "joue moi de la musique de <snips/musicArtist>").

So we fail here, that’s embarrassing but this should not be a problem since we get a second change with the ML-powered parser !

Now, why does the deterministic parser fail ?

Looking at the logs, I saw that the algorithm learnt the following weights in for the classification of the setence:

"fastjack:onMusicPlayArtist" -> (ngram:builtinentityfeaturesnipsmusicartist, 4.61)
"fastjack:onTest" -> (ngram:builtinentityfeaturesnipsmusicartist, -4.50)
"fastjack:onMusicPlay" -> (ngram:builtinentityfeaturesnipsmusicartist, -4.28)
"None" -> (ngram:builtinentityfeaturesnipsmusicartist, -3.63)
"fastjack:onMusicPlay" -> (ngram:musiqu, 3.42)
"fastjack:onMusicPlay" -> (ngram:entityfeaturetestbad, 3.10)
.
.
.

which basically translates in:

I see a <snips/musicArtist> entity then I add 4.61 to the score of "fastjack:onMusicPlayArtist"
I see a <snips/musicArtist> entity then I remove 4.50 to the score of "fastjack:onTest"
I see a <snips/musicArtist> entity then I remove 4.28 to the "fastjack:onMusicPlay"
I see a <snips/musicArtist> entity then I remove 3.62 to the None intent
I see the word "musique" then I add 3.42 to the score of "fastjack:onMusicPlay"
I see a <test/bad> entity then I add 3.10 to the score of "fastjack:onMusicPlay"
.
.
.

The first 5 lines seem very reasonable however the last one is responsible for the classifier wrong decision. Why did the classifier learned such a rule ? If it see a <test/bad> entity then it should point to you onTest intent.
The problem is that I see that the word “musique” is listed in the values of your test/bad entity, thus the classifier learn to give a strong weight towards the onMusicPlay intent. That’s normal the word “musique” appears a lot in the onMusicPlay intent and it’s tagged a test/bad entity.

What are your options ?

1. Wait for the next release

I the next release we’ll roll out a new implementation of the determistic intent parser that won’t have the same problems has the current one, I’ve tested it on your data and the "joue moi de la musique de david bowie" gets perfecly parsed (1.0 score for onMusciPlayArtist).

2. Remove "musique" from the test/bad entity

I’m not sure of the role of this entity and intent but tagging this word that is very very common intents of the same assistant can a LOT of negative side effects

I Hope this helps !
Clément

1 Like

Alright. I understand what is happening now. Thanks!

I will be waiting for the next release then.

Cheers.

I have a similar issue. here is my dataset

{
  "entities": {
    "category": {
      "automatically_extensible": true,
      "data": [
        {
          "synonyms": [],
          "value": "sales manager"
        },
        {
          "synonyms": [],
          "value": "driver"
        },
        {
          "synonyms": [
            "seniour accountant",
            "junior accountant",
            "accounts"
          ],
          "value": "accountant"
        },
        {
          "synonyms": [
            "marketing executive"
          ],
          "value": "sales executive"
        },
        {
          "synonyms": [],
          "value": "sales officer"
        },
        {
          "synonyms": [
            "sales man"
          ],
          "value": "salesman"
        },
        {
          "synonyms": [
            "cook"
          ],
          "value": "chef"
        },
        {
          "synonyms": [],
          "value": "coffee operator"
        },
        {
          "synonyms": [
            "nursing"
          ],
          "value": "nurse"
        },
        {
          "synonyms": [],
          "value": "beautician"
        }
      ],
      "matching_strictness": 1.0,
      "use_synonyms": true
    },
    "snips/ordinal": {},
    "type": {
      "automatically_extensible": false,
      "data": [
        {
          "synonyms": [
            "partime",
            "parttime"
          ],
          "value": "part time"
        },
        {
          "synonyms": [
            "full time"
          ],
          "value": "fulltime"
        }
      ],
      "matching_strictness": 1.0,
      "use_synonyms": true
    }
  },
  "intents": {
    "searchJob": {
      "utterances": [
        {
          "data": [
            {
              "text": "i need a job"
            }
          ]
        },
        {
          "data": [
            {
              "text": "search job"
            }
          ]
        },
        {
          "data": [
            {
              "text": "i need a "
            },
            {
              "entity": "category",
              "slot_name": "category",
              "text": "sales manager"
            },
            {
              "text": " job"
            }
          ]
        },
        {
          "data": [
            {
              "text": "show me "
            },
            {
              "entity": "category",
              "slot_name": "category",
              "text": "driver"
            },
            {
              "text": " jobs"
            }
          ]
        },
        {
          "data": [
            {
              "text": "show me some jobs"
            }
          ]
        },
        {
          "data": [
            {
              "text": "i need a "
            },
            {
              "entity": "type",
              "slot_name": "type",
              "text": "part time"
            },
            {
              "text": " job"
            }
          ]
        },
        {
          "data": [
            {
              "text": "any "
            },
            {
              "entity": "type",
              "slot_name": "type",
              "text": "partime"
            },
            {
              "text": " jobs avalable"
            }
          ]
        },
        {
          "data": [
            {
              "text": "any "
            },
            {
              "entity": "category",
              "slot_name": "category",
              "text": "accountant"
            },
            {
              "text": " jobs"
            }
          ]
        },
        {
          "data": [
            {
              "text": "search "
            },
            {
              "entity": "category",
              "slot_name": "category",
              "text": "seniour accountant"
            },
            {
              "text": " jobs"
            }
          ]
        },
        {
          "data": [
            {
              "text": "search "
            },
            {
              "entity": "type",
              "slot_name": "type",
              "text": "parttime"
            },
            {
              "text": " jobs"
            }
          ]
        }
      ]
    },
	"postJob": {
      "utterances": [
        {
          "data": [
            {
              "text": "i need a "
            },
            {
              "entity": "category",
              "slot_name": "category",
              "text": "junior accountant"
            }
          ]
        },
        {
          "data": [
            {
              "text": "i have a vacancy"
            }
          ]
        },
        {
          "data": [
            {
              "text": "i need a "
            },
            {
              "entity": "type",
              "slot_name": "type",
              "text": "fulltime"
            },
            {
              "text": " "
            },
            {
              "entity": "category",
              "slot_name": "category",
              "text": "accounts"
            }
          ]
        },
        {
          "data": [
            {
              "text": "post a job"
            }
          ]
        }
      ]
    },
    "showPostedList": {
      "utterances": [
        {
          "data": [
            {
              "text": "show me "
            },
            {
              "entity": "snips/ordinal",
              "slot_name": "ordinal",
              "text": "1st"
            },
            {
              "text": " posted job"
            }
          ]
        },
        {
          "data": [
            {
              "text": "show my posts"
            }
          ]
        },
        {
          "data": [
            {
              "text": "show jobs posted by me"
            }
          ]
        }
      ]
    }
  },
  "language": "en"
}

when i query “show me developer jobs” instead of getting searchJob intent i am getting showPostedList intent.
(note: developer is not in the list of category. but extensible is true for entity category. so by matching show me [category] jobs it is expected to return the searchJob intent with developer as category)

also when i query “i need a developer job” instead of getting searchJob intent i am getting postJob with developer job as the category. it seems like it parse as i need a [category]. but why it doesn’t parse as i need a [category] job …?

Please help me fix this… Thank you in advance :blush:

Hey @sabieworld,
It seems that your intents are quite close and the NLU struggles a bit classifying them when using the default configuration.
The default configuration uses a logistic regression to perform the intent classification, and it does so by looking at unigrams only.

The idea is to update this configuration to also take bigrams into account.
You can do so by running the following:

import io
import json
from copy import deepcopy

from snips_nlu import SnipsNLUEngine
from snips_nlu.default_configs.config_en import CONFIG as CONFIG_EN

custom_config = deepcopy(CONFIG_EN)
custom_config["intent_parsers_configs"][1]["intent_classifier_config"][
    "featurizer_config"]["added_cooccurrence_feature_ratio"] = 0.25

engine = SnipsNLUEngine(config=custom_config)
with io.open("path/to/dataset.json", encoding="utf8") as f:
    dataset = json.load(f)

engine.fit(dataset)

print(engine.parse("show me developer jobs"))
print(engine.parse("I need a developer job"))

This configuration will probably work a bit better in your use case.
I hope this helps.
Cheers

1 Like

@adri Thank you for your fast response :heart: :slight_smile:

i applied that custom configuration. unfortunately, the problem is reversed now :frowning:

before:
I need a developer job => postJob (should be searchJob)
I need a developer => postJob

now:
I need a developer job => searchJob
I need a developer => searchJob (should be postJob)

i tried by setting lower values for added_cooccurrence_feature_ratio . but no luck. is there any other configuration i can set to achieve this…?

it is more expected to say i need a developer when someone wants to hire and i need a developer job when someone wants to be hired. so i am unable to ignore those utterances. please let me know if there any further workaround available…

Thank you so much…:rose:

@adri The issue is fixed by setting a higher value for added_cooccurrence_feature_ratio.

it was a big mistake from my side. when the problem is reversed, i thought i should decrease the value to get it work :joy: . that is why i set a lower value.

now it works like a charm. Thank you so much for your great support :trophy:

1 Like

@adri for the same dataset i posted above, when i parse any text which contains the word job returns serachJob intent. for example when i give edit job title i expect null intent as i don’t have an utterance like <some text> job <some other text>. instead i am getting intent searchJob.

is there any configuration changes i can apply so that an intent will be returned only if it matches with given utterances, otherwise it should return null intent.

not how it works… its not as simple as only match examples as i have entered

You can force the NLU engine to only parse sentences that are part of the training dataset (or differ very slightly) by removing the probabilistic intent parser in the configuration, in order to keep only the lookup intent parser:

import io
import json

from snips_nlu import SnipsNLUEngine

custom_config = {
    "unit_name": "nlu_engine",
    "intent_parsers_configs": [
        {
            "unit_name": "lookup_intent_parser",
            "ignore_stop_words": True
        }
    ]
}

engine = SnipsNLUEngine(config=custom_config)
with io.open("path/to/dataset.json", encoding="utf8") as f:
    dataset = json.load(f)

engine.fit(dataset)

print(engine.parse("show me developer jobs"))
print(engine.parse("edit job title"))

Keep in mind that the resulting engine will no longer be able to understand variations around the utterances provided in the training dataset.

I will probably add a section about this in the documentation, as it may be valuable to other users.

Cheers

1 Like

@adri Thank you for your continued support. when i keep lookup intent parser only, i wont get the automatically extensible feature right…? because when i tried to parse some sentences (which are exactly same as the pattern given in the dataset, but entity values are from out of dataset) it returns null intent.

Yes, indeed, the automatically extensible feature only works when the probabilistic intent parser is activated.

1 Like

Hi Snipsters! Can you give us an ETA on this new intent parser implementation? Thanks :slight_smile:

Hi @fastjack,
This new parser should be deployed this week as part of a bigger update.