NLU engine not matching the whole slot value

Hi there!

For my master thesis, I’m building a dialogue system for a calendar. The user should be able to find the correct appointment for his utterance. A simple sample utterance could be something like “When is my next appointment with person”. Since there could be multiple persons that are called Maria, I want the dialogue system to ask which one is meant. I know all persons that are in the appointment database so I added for every person all the valid permutations of the full name to the synonym list for the slot person.

Let’s say the full name of the Maria we are talking about is “Eva Maria Miller”. If the user would say next “I meant Eva Maria” I want the nlu engine to extract “Eva Maria” as a person but instead it just detects “Maria” again, even if “Eva Maria” is a synonym in the training data.

First I thought it comes from the dropout probability. But then I checked the trained model files and found that “Eva Maria” was in a parser file containing a lot of different synonyms. I guess that are features used by the CRF. Am I right here? The file is not in a fully human-readable encoding.

So does someone has an idea why only “Maria” is matched instead of the whole slot? And what could I do about it? I also would like to know, if synonyms have a chance to be dropped out as features in the training. I don’t think so because the hash sum of the parser file is always the same, but I’m still unsure.

I hope someone can help me. :slight_smile:

Hi,

I just found my problem! The actual name contained an umlaut (ä,ü,ö). In the process of creating a yaml file for the enitity person it always surrounded synonyms containing an umlaut with the quotation marks ". Probably because of escaping. I just removed them afterwards and now it seems to work.

Before the line looked like this:

  • “[eva, maria, eva maria, maria eva, m\xFCller, eva m\xFCller, maria
    \ m\xFCller, eva maria m\xFCller, maria eva m\xFCller, evas, marias, eva
    \ marias, maria evas, m\xFCllers, eva m\xFCllers, maria m\xFCllers, eva
    \ maria m\xFCllers, maria eva m\xFCllers]”

Quotation marks in the training data seem to make the behaviour of the nlu engine really unpredictable. It still predicted the same entity but just used a different synonym which in my case was curcial because I thought it works.

I hope that I can help people with a similar problem.

Have a nice day everybody!