In my work I have with great results used approximate string matching algorithms such as Damerau–Levenshtein distance to make my code less vulnerable to spelling mistakes.
Now I have a need to match strings against simple regular expressions such TV Schedule for \d\d (Jan|Feb|Mar|...)
. This means that the string TV Schedule for 10 Jan
should return 0 while T Schedule for 10. Jan
should return 2.
This could be done by generating all strings in the regex (in this case 100x12) and find the best match, but that doesn't seam practical.
Do you have any ideas how to do this effectively?
Fuzzy Matching (also called Approximate String Matching) is a technique that helps identify two elements of text, strings, or entries that are approximately similar but are not exactly the same.
For example, if a user types "Misissippi" into Yahoo or Google -- both of which use fuzzy matching -- a list of hits is returned along with the question, "Did you mean Mississippi?" Alternative spellings and words that sound the same but are spelled differently are given.
FuzzyWuzzy Python Library: Interesting Tool for NLP and Text Analytics.
Fuzzywuzzy is a python library that uses Levenshtein Distance to calculate the differences between sequences and patterns that was developed and also open-sourced by SeatGeek, a service that finds event tickets from all over the internet and showcase them on one platform.
I found the TRE library, which seems to be able to do exactly fuzzy matching of regular expressions. Example: http://hackerboss.com/approximate-regex-matching-in-python/ It only supports insertion, deletion and substitution though. No transposition. But I guess that works ok.
I tried the accompanying agrep tool with the regexp on the following file:
TV Schedule for 10Jan TVSchedule for Jan 10 T Schedule for 10 Jan 2010 TV Schedule for 10 March Tv plan for March
and got
$ agrep -s -E 100 '^TV Schedule for \d\d (Jan|Feb|Mar)$' filename 1:TV Schedule for 10Jan 8:TVSchedule for Jan 10 7:T Schedule for 10 Jan 2010 3:TV Schedule for 10 March 15:Tv plan for March
Thanks a lot for all your suggestions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With