Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex misspellings

Tags:

regex

I have a regex created from a list in a database to match names for types of buildings in a game. The problem is typos, sometimes those writing instructions for their team in the game will misspell a building name and obviously the regex will then not pick it up (i.e. spelling "University" and "Unversity").

Are there any suggestions on making a regex match misspellings of 1 or 2 letters?

The regex is dynamically generated and run on a local machine that's able to handle a lot more load so I have as a last resort to algorithmically create versions of each word with a letter missing and then another with letters added in.

I'm using PHP but I'd hope that any solution to this issue would not be PHP specific.

like image 211
Teifion Avatar asked Dec 24 '08 21:12

Teifion


People also ask

How do you write misspellings?

The correct spelling is misspell. Be careful not to omit the second s in derivative forms, such as misspelling and misspells.


4 Answers

Allow me to introduce you to the Levenshtein Distance, a measure of the difference between strings as the number of transformations needed to convert one string to the other.

It's also built into PHP.

So, I'd split the input file by non-word characters, and measure the distance between each word and your target list of buildings. If the distance is below some threshold, assume it was a misspelling.

I think you'd have more luck matching this way than trying to craft regex's for each special case.

like image 68
Kenan Banks Avatar answered Oct 18 '22 04:10

Kenan Banks


Google's implementation of "did you mean" by looking at previous results might also help:

How do you implement a "Did you mean"?

like image 45
Neil Barnwell Avatar answered Oct 18 '22 04:10

Neil Barnwell


What is Soundex() ? – Teifion (28 mins ago)

A soundex is similar to the levenshtein function Triptych mentions. It is a means of comparing strings. See: http://us3.php.net/soundex

You could also look at metaphone and similar_text. I would have put this in a comment but I don't have enough rep yet to do that. :D

like image 3
Will Bickford Avatar answered Oct 18 '22 04:10

Will Bickford


Back in the days we sometimes used Soundex() for these problems.

like image 2
PEZ Avatar answered Oct 18 '22 03:10

PEZ