Commit 786af27f authored by pracht's avatar pracht
Browse files

Formulated some more problems for the meeting

parent bef3df92
Loading
Loading
Loading
Loading
+107 −3
Original line number Diff line number Diff line
## Das Problem mit den MultiWOZ-Daten:

### Annotationen beinhalten keine semantischen Informationen
Beispiel Satz:
Hier sind einige Probleme aufgelistet, die uns beim erstellen des Datenmodells für die Knowledge-Base aufgefallen sind.

### Wie modellieren wir wechselnde Intentions?

Beispiele:

```
"You know, I've changed my mind. I don't need anything else today. Thanks very much for your help."

"I've changed my mind and would prefer to stay in the center of town. I need parking, wifi, and it should have a 0 star rating."

"I'm sorry I changed my mind. I just need the price per ticket please."

"I changed my mind about the internet. Does Leverton House have free wifi?"

"Yes actually, could you book that for me? I've changed my mind."

"I've changed my mind and would like to go ahead and pick the center of town. Wifi would be great."

"Actually I changed my mind. I do want to book the hotel."

"I'm sorry, I changed my mind. I don't need a room booked. I just need their address."

"I'm sorry. I changed my mind. I don't want a guesthouse. Are there any hotels with 2 stars?"

"I'll be leaving on Tuesday and I changed my mind, I prefer a train that leaves after 14:00"

"I changed my mind can you book that for me please?"

"You know what, I've changed my mind. I'd prefer international to Italian. Are there any like that in the east part of town?"

"I changed my mind. Could you find a restaurant that serves asian oriental food?"

"I'm sorry, I've changed my mind. Can you please make the reservation at the guesthouse for 6 people and 3 nights starting on Monday."

"Actually, I've changed my mind. I'm going to wait on that reservation, thanks."

"I changed my mind. I will go ahead and book it myself. But, can you please provide me the address and postcode. Thanks."

"I've changed my mind, I don't want to book it yet. Can you give me the area, address, and phone number, please?"

"I changed my mind, don't worry about getting me a train reservation. Just please recommend an attraction to visit on the West side of town"

"Actually I changed my mind just 2 nights for 5 people"
```

### Wie gehen wir mit unvollständigen Daten um?

Bei der Dialog-Erstellung hat der Mensch, der das System schreibt eine Knowledge Base mit Informationen über mögliche Restaurants, Hotels, etc.

Diese versuchen wir aus den Äußerungen im Dialog für unsere Anwendungsfälle zu rekonstruieren.

Das Problem dabei ist, dass nicht alle Entitäten vollständig benannt werden.

Beispiel aus "PMUL4398.json":

- USER: "I need a place to dine **in the center** thats **expensive**"

- SYSTEM: "I have several options for you; do you prefer **African**, **Asian**, or **British** food?"

- USER:"Any sort of food would be fine, as long as it is a bit expensive. Could I get the phone number for your recommendation?"

- SYSTEM: "There is an Afrian place named **Bedouin** in the centre. How does that sound?"

- USER: "Sounds good, could I get that phone number?"

- SYSTEM: "Bedouin's phone is **01223367660**."

Daraus können wir folgenden Datenpunkt ziehen:

```json
Restaurant: {
    name: "bedouin",
    food: "african",
    price: "expensive",
    place: "center",
    phone: "01223367660",
}

Restaurant: {
    food: "asian",
    price: "expensive",
    place: "center",
}

Restaurant: {
    food: "british",
    price: "expensive",
    place: "center",
}
```

Während wir für "Bedouin" einen fast vollständigen Datensatz bekommen (Adresse fehlt), haben wir für das asiatische und britische Restaurant zu wenige Informationen um einen Eintrag in der Datenbank zu machen.

Die Option diese Triplets ("restaurant-food-asian", "restaraurent-place-center", ..) zu sammeln und am Ende zusammenzuführen gestaltet sich auch schwierig, weil es keine Daten gibt, diese Triplets logisch zusammen zu führen, ohne die Dialoge zu parsen. (Siehe nächster Punkt)


### Wie gehen wir mit ambigen Annotationen um?

Die Annotationen enthalten nicht ausreichend semantische Informationen um einige Sachverhalte abzubilden.

Beispiel Satz aus "SNG1013.json":
```
> "I have 4 different options for you. I have two cheaper guesthouses and two expensive hotels. Do you have a preference?"
```
@@ -62,8 +162,12 @@ Desweiteren kommt hinzu, dass die möglichen Values von `choice` sehr heterogen

Mögliche choices im Testsatz:
```
'0', '000', '029', '05JKM1ZF', '1', '1 each', '1,414', '10', '10 options available', '100', '1029', '104', '105', '1064', '107', '11', '110', '117', '11]', '12', '12 available', '122', '13', '133', '134', '139', '14', '1414', '146', '149', '15', '15 possible ones', '150', '152', '16', '168', '17', '172', '18', '19', '195', '198', '2', '2 hotels and 2 guesthouses', '2 other', '2 others', '2,828', '2,Choice-one', '20', '200', '202', '204', '2058', '206', '21', '210', '22', '229', '23', '231', '24', '245', '25', '252', '256', '259', '266', '278', '28', '2800', '2828', '29', '3', '3 more', '3 others', '30', '30+', '300', '31', '318', '32', '325', '33', '33 fine', '35', '356', '38', '4', '4 different venues available', '4 other', '4 others', '40', '404', '42', '44', '44 great thoughts at hand', '46', '49', '49 train options', '5', '52', '546', '56', '566', '57', '59', '6', '6 of those 7', '60', '623', '63', '66', '69', '69 in total', '7', '70', '714', '735', '77', '79', '791', '8', '819', '84', '9', '9 available', '9 of the 10 results', '91', '924', '973', '98', '98 king street', '99', 'A FEW OPTIONS', 'A majority', 'All', 'All Friday', 'All but one', 'All of', 'All of them', 'All other', 'All the other', 'Almost 80', 'Almost every', 'Almost every place', 'Boat Attractions', 'Bot', 'Both', 'Cambridge', 'Centre', 'Each', 'Every place but one', 'First', 'First train out after 13:00', 'Five', 'Four', 'Fourteen', 'Just one', 'La Mimosa', 'Lots', 'MULTIPLE', 'Many', 'Most', 'Nandos', 'One', 'Seventeen', 'Several', 'Six', 'Some', 'The best', 'The closest one', 'The first', 'The only', 'Three', 'Twenty one', 'Two', 'a', 'a LOT', 'a bunch', 'a couple', 'a couple of', 'a dozen', 'a few', 'a few different options', 'a few earlier', 'a few options', 'a few others', 'a good number', 'a great deal', 'a great deal of', 'a great number', 'a handful', 'a large amount', 'a large number', 'a list of', 'a lot', 'a lot of', 'a lot of tasty choices', 'a lot to do', 'a multitude', 'a number', 'a number of', 'a number of trains', 'a range', 'a single', 'a ton', 'a ton of options', 'a train', 'a variety', 'a variety of', 'a wide range', 'a wide range of places', 'a wide variety', 'about 10', 'about 11', 'about 13', 'about 14', 'about 15', 'about 17', 'about 18', 'about 19', 'about 23', 'about 30 or so', 'about 33', 'about 44', 'about 5', 'about 6', 'about 7', 'about 70', 'about 79', 'about 8', 'about 9', 'abundant', 'all', 'all 3', 'all five', 'all sorts of', 'all three', 'all we have', 'almost 3,000', 'almost 30', 'almost 80', 'almost all', 'along list', 'alot', 'also an', 'also eight earlier routes if you prefer', 'any', "aren't many", 'at least 33', 'at least 8', 'at least 9', 'at least three', 'at least two', 'before', 'boat', 'both', 'both Pizza Express', 'both of', 'both of those', 'bunch', 'choices', 'close to 80', 'closest I have to 12:00', 'closest arrival time to 11:00', 'closest one', 'closest you could arrive', 'couple', 'daily departures', "don't have any", 'dontcare', 'dozens', 'each', 'earlier', 'earliest', 'earliest train', 'earliest train after 10:15', 'early', 'eight', 'eighteen', 'either', 'eleven', 'every 2 hours beginning at 5:39', 'every couple of hours', 'every hour', 'fair number', 'few', 'fifteen', 'first', 'first available', 'first train after 10:30', 'first train available', 'first train on that route after 13:45', 'five', 'fives', 'forty four', 'forty-four', 'four', 'four more', 'fourteen', 'full of', 'give', 'hundreds', 'just a few', 'just about every', 'just one', 'large number of', 'last', 'last train', 'last train of the day', 'later one', 'latest', 'latest train', 'loads', 'long list', 'long list of matches', 'lots', 'lots of', 'lots of places', 'majority of', 'many', 'many different', 'many different options', 'many diverse', 'many fine', 'many options', 'many other', 'many other choices', 'many other times with an earlier departure time', 'many things to do', 'many types', 'moderately priced', 'more', 'more than 1,000', 'more than 20', 'more than 30', 'more than 40', 'more than one hundred', 'more than two dozen', 'most', 'most of', 'most of them', 'much', 'much to see', 'multiple', 'multiple options', 'nearly 80', 'neither', 'next listing available', 'next train', 'next train after 12:00', 'nine', 'nineteen', 'no', 'none', 'none of', 'number', 'numerous', 'on', 'one', 'one before that', 'one each', 'one of them', 'one of these', 'one of your options', 'one other option', 'one,general-reqmore:', 'only', 'only 1', 'only 2', 'only 4', 'only on', 'only one', 'only option', 'only option available', 'only options', 'only other', 'only other option', 'only result', 'only results', 'only two', 'oodles of', 'options', 'other', 'other 3', 'other choice', 'other options', 'others', 'over 1,400', 'over 10', 'over 100', 'over 1000', 'over 11', 'over 110', 'over 15', 'over 2,000', 'over 2,800', 'over 20', 'over 200', 'over 200 possibilities', 'over 2800', 'over 30', 'over 31', 'over 33', 'over 40', 'over 400', 'over 500', 'over 600', 'over 623', 'over 69', 'over 79', 'over 800', 'over a dozen', 'over a hundred', 'over a thousand', 'over one hundred', 'over twenty', 'over two thousand', 'plenty', 'plenty more', 'plenty of', 'plenty of earlier', 'plenty of options', 'plenty of places', 'plenty of those', 'plenty of trains', 'quite a few', 'quite a few options', 'quite a lot', 'quite a number', 'quite a number of them', 'quite a selection', 'results', 'serveal', 'seven', 'several', 'several available option', 'several different', 'several options', 'several others', 'six', 'so many', 'so much', 'some', 'some options', 'sveral', 'ten', 'that', 'the first', 'the latest', 'the only', 'the only one', 'the only option', 'the only places', 'the only two', 'there are several', 'they', 'thirteen', 'thirty thee', 'thirty-three', 'three', 'three other options', 'to', 'tons', 'tons of', 'tons of them', 'trains', 'trains every hour', 'trains leaving hourly', 'twelve', 'twenty-one', 'two', 'two in that price range', 'two more', 'two others', 'two,general-reqmore:', 'variety', 'various', 'wide range of', 'wide selection', 'wide variety', 'wide-range'
'0', '000', '029', '05JKM1ZF', '1', '1 each', '1,414', '10', '10 options available', '100', '1029', '104', '105', '1064', '107', '11', '110', '117', '11', '12', '12 available', '122', '13', '133', '134', '139', '14', '1414', '146', '149', '15', '15 possible ones', '150', '152', '16', '168', '17', '172', '18', '19', '195', '198', '2', '2 hotels and 2 guesthouses', '2 other', '2 others', '2,828', '2,Choice-one', '20', '200', '202', '204', '2058', '206', '21', '210', '22', '229', '23', '231', '24', '245', '25', '252', '256', '259', '266', '278', '28', '2800', '2828', '29', '3', '3 more', '3 others', '30', '30+', '300', '31', '318', '32', '325', '33', '33 fine', '35', '356', '38', '4', '4 different venues available', '4 other', '4 others', '40', '404', '42', '44', '44 great thoughts at hand', '46', '49', '49 train options', '5', '52', '546', '56', '566', '57', '59', '6', '6 of those 7', '60', '623', '63', '66', '69', '69 in total', '7', '70', '714', '735', '77', '79', '791', '8', '819', '84', '9', '9 available', '9 of the 10 results', '91', '924', '973', '98', '98 king street', '99', 'A FEW OPTIONS', 'A majority', 'All', 'All Friday', 'All but one', 'All of', 'All of them', 'All other', 'All the other', 'Almost 80', 'Almost every', 'Almost every place', 'Boat Attractions', 'Bot', 'Both', 'Cambridge', 'Centre', 'Each', 'Every place but one', 'First', 'First train out after 13:00', 'Five', 'Four', 'Fourteen', 'Just one', 'La Mimosa', 'Lots', 'MULTIPLE', 'Many', 'Most', 'Nandos', 'One', 'Seventeen', 'Several', 'Six', 'Some', 'The best', 'The closest one', 'The first', 'The only', 'Three', 'Twenty one', 'Two', 'a', 'a LOT', 'a bunch', 'a couple', 'a couple of', 'a dozen', 'a few', 'a few different options', 'a few earlier', 'a few options', 'a few others', 'a good number', 'a great deal', 'a great deal of', 'a great number', 'a handful', 'a large amount', 'a large number', 'a list of', 'a lot', 'a lot of', 'a lot of tasty choices', 'a lot to do', 'a multitude', 'a number', 'a number of', 'a number of trains', 'a range', 'a single', 'a ton', 'a ton of options', 'a train', 'a variety', 'a variety of', 'a wide range', 'a wide range of places', 'a wide variety', 'about 10', 'about 11', 'about 13', 'about 14', 'about 15', 'about 17', 'about 18', 'about 19', 'about 23', 'about 30 or so', 'about 33', 'about 44', 'about 5', 'about 6', 'about 7', 'about 70', 'about 79', 'about 8', 'about 9', 'abundant', 'all', 'all 3', 'all five', 'all sorts of', 'all three', 'all we have', 'almost 3,000', 'almost 30', 'almost 80', 'almost all', 'along list', 'alot', 'also an', 'also eight earlier routes if you prefer', 'any', "aren't many", 'at least 33', 'at least 8', 'at least 9', 'at least three', 'at least two', 'before', 'boat', 'both', 'both Pizza Express', 'both of', 'both of those', 'bunch', 'choices', 'close to 80', 'closest I have to 12:00', 'closest arrival time to 11:00', 'closest one', 'closest you could arrive', 'couple', 'daily departures', "don't have any", 'dontcare', 'dozens', 'each', 'earlier', 'earliest', 'earliest train', 'earliest train after 10:15', 'early', 'eight', 'eighteen', 'either', 'eleven', 'every 2 hours beginning at 5:39', 'every couple of hours', 'every hour', 'fair number', 'few', 'fifteen', 'first', 'first available', 'first train after 10:30', 'first train available', 'first train on that route after 13:45', 'five', 'fives', 'forty four', 'forty-four', 'four', 'four more', 'fourteen', 'full of', 'give', 'hundreds', 'just a few', 'just about every', 'just one', 'large number of', 'last', 'last train', 'last train of the day', 'later one', 'latest', 'latest train', 'loads', 'long list', 'long list of matches', 'lots', 'lots of', 'lots of places', 'majority of', 'many', 'many different', 'many different options', 'many diverse', 'many fine', 'many options', 'many other', 'many other choices', 'many other times with an earlier departure time', 'many things to do', 'many types', 'moderately priced', 'more', 'more than 1,000', 'more than 20', 'more than 30', 'more than 40', 'more than one hundred', 'more than two dozen', 'most', 'most of', 'most of them', 'much', 'much to see', 'multiple', 'multiple options', 'nearly 80', 'neither', 'next listing available', 'next train', 'next train after 12:00', 'nine', 'nineteen', 'no', 'none', 'none of', 'number', 'numerous', 'on', 'one', 'one before that', 'one each', 'one of them', 'one of these', 'one of your options', 'one other option', 'one,general-reqmore:', 'only', 'only 1', 'only 2', 'only 4', 'only on', 'only one', 'only option', 'only option available', 'only options', 'only other', 'only other option', 'only result', 'only results', 'only two', 'oodles of', 'options', 'other', 'other 3', 'other choice', 'other options', 'others', 'over 1,400', 'over 10', 'over 100', 'over 1000', 'over 11', 'over 110', 'over 15', 'over 2,000', 'over 2,800', 'over 20', 'over 200', 'over 200 possibilities', 'over 2800', 'over 30', 'over 31', 'over 33', 'over 40', 'over 400', 'over 500', 'over 600', 'over 623', 'over 69', 'over 79', 'over 800', 'over a dozen', 'over a hundred', 'over a thousand', 'over one hundred', 'over twenty', 'over two thousand', 'plenty', 'plenty more', 'plenty of', 'plenty of earlier', 'plenty of options', 'plenty of places', 'plenty of those', 'plenty of trains', 'quite a few', 'quite a few options', 'quite a lot', 'quite a number', 'quite a number of them', 'quite a selection', 'results', 'serveal', 'seven', 'several', 'several available option', 'several different', 'several options', 'several others', 'six', 'so many', 'so much', 'some', 'some options', 'sveral', 'ten', 'that', 'the first', 'the latest', 'the only', 'the only one', 'the only option', 'the only places', 'the only two', 'there are several', 'they', 'thirteen', 'thirty thee', 'thirty-three', 'three', 'three other options', 'to', 'tons', 'tons of', 'tons of them', 'trains', 'trains every hour', 'trains leaving hourly', 'twelve', 'twenty-one', 'two', 'two in that price range', 'two more', 'two others', 'two,general-reqmore:', 'variety', 'various', 'wide range of', 'wide selection', 'wide variety', 'wide-range'
```


#### Mögliche Lösungen

1. Problem ignorieren
    Man versucht das beste daraus zu machen