For my master thesis, I’m building a dialogue system for a calendar. The user should be able to find the correct appointment for his utterance. A simple sample utterance could be something like “When is my next appointment with person”. Since there could be multiple persons that are called Maria, I want the dialogue system to ask which one is meant. I know all persons that are in the appointment database so I added for every person all the valid permutations of the full name to the synonym list for the slot person.
Let’s say the full name of the Maria we are talking about is “Eva Maria Miller”. If the user would say next “I meant Eva Maria” I want the nlu engine to extract “Eva Maria” as a person but instead it just detects “Maria” again, even if “Eva Maria” is a synonym in the training data.
First I thought it comes from the dropout probability. But then I checked the trained model files and found that “Eva Maria” was in a parser file containing a lot of different synonyms. I guess that are features used by the CRF. Am I right here? The file is not in a fully human-readable encoding.
So does someone has an idea why only “Maria” is matched instead of the whole slot? And what could I do about it? I also would like to know, if synonyms have a chance to be dropped out as features in the training. I don’t think so because the hash sum of the parser file is always the same, but I’m still unsure.
I hope someone can help me.