Fuzzy stuff

Sunday, August 10. 2008
I have just pushed a little backend bugfix onto the MobMap server that fixes a problem which has been there for many months. You might have noticed that the links in quest objective texts that MobMap presents you sometimes lead to the wrong NPCs - usually to some which have a very similar name as the ones you're searching, but nevertheless the link was crap. I know about some quests in the German database which had this problem, but there probably are similar problematic quests in the English database.

The reason for this was some kind of bad design in the component which actually does create those links for you to click - the so-called Quest Text Enhancer. That is essentially a fuzzy string matcher to which a list of all NPCs and a text is given. The QTE then searches the quest text for occurrences of the given NPC names, and because those occurrences quite often dont't match the NPC name exactly (because they are written in plural form for example, while the NPC name is a singular, or because they are "bent", which is often the case in the German language) this search is done "fuzzy" - that means, "Tamed Hyena" for example does not only match "Tamed Hyena", but also similar names like "Tamed Hyenas" (notice the "s" at the end!). There's a rating for how much difference is allowed for a match which again depends on the length of the name, but to get a high matching percentage (especially in texts written in the German language which tends to modify the names more often and more radically than the English language) those ratings can't be set too high. That was the reason why there were some false positives in the mix, which resulted in crappy links in-game, linking to creatures with similar, but unfortunately wrong names.

The part with the "bad design" was this: if the fuzzy search algorithm found a potential match, it immediately created the link, without searching the rest of the NPC list if there were any more matches to be found. So if there were more than one match, it was just a game of luck if the correct match would be checked first or not. This has now been changed: now the whole list is always scanned, collecting all matches on the way and calculating a "similarity coefficient" which mathematically describes the similarity of the NPC name to the occurrence in the text. After the whole list has been searched, the match list is being sorted by this number, and the match with the highest similarity is picked for the link.

Besides that, there was another problem, concerning an NPC with the name "Hemet Nesingwary Jr." which exists in both the English and German versions of the game. That damn NPC had a dot in its name - and dots are considered to mark the end of a sentence by the QTE, not to be a part of a name. Therefore this NPC, if named in a quest text, was never being recognized as "Hemet Nesingwary Jr." - instead, it was recognized as "Hemet Nesingwary", another NPC that unfortunately exists, too. Well, it's logical from a machine's point of view, as "Hemet Nesingwary" seems to be a perfect match, and "Hemet Nesingwary Jr." does not match that good if the dot at the end is not considered to be a part of the NPC name. But of course this leads to confusion if you click that link in-game, expecting to be shown the location of the guy with the "Jr." at the end. This has been fixed, too.


As all of these changes are done in the backend, you won't really have to do anything to profit - you'll get the "less-faulty" links with your next database update (okay, of course you'll have to download that, either comfortably with the MobMapUpdater or manually). All quest texts and comments are being reprocessed right now, so the next database built in a few hours should contain them.