Python's new regex module supports fuzzy string matching. Sing praises aloud (now).
Per the docs:
The
ENHANCEMATCH
flag makes fuzzy matching attempt to improve the fit of the next match that it finds.The
BESTMATCH
flag makes fuzzy matching search for the best match instead of the next match
The ENHANCEMATCH
flag is set using (?e)
as in
regex.search("(?e)(dog){e<=1}", "cat and dog")[1]
returns "dog"
but there's nothing on actually setting the BESTMATCH
flag. How's it done?
Fuzzy searches help you find relevant results even when the search terms are misspelled. To perform a fuzzy search, append a tilde (~) at the end of the search term. For example the search term bank~ will return rows that contain tank , benk or banks .
This topic describes how fuzzy scores are calculated when comparing two strings or two terms. The fuzzy search algorithm calculates a fuzzy score for each string comparison. The higher the score, the more similar the strings are. A score of 1.0 means the strings are identical.
Fuzzy string matching can help improve data quality and accuracy by data deduplication, identification of false-positives etc.
Documentation on the BESTMATCH
flag functionality is partial (but improving). Poke-n-hope shows that BESTMATCH
is set using (?b)
.
>>> import regex
>>> regex.search(r"(?e)(?:hello){e<=4}", "What did you say, oh - hello")[0]
'hat d'
>>> regex.search(r"(?b)(?:hello){e<=4}", "What did you say, oh - hello")[0]
'hello'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With