Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find the best fuzzy string match?

Python's new regex module supports fuzzy string matching. Sing praises aloud (now).

Per the docs:

The ENHANCEMATCH flag makes fuzzy matching attempt to improve the fit of the next match that it finds.

The BESTMATCH flag makes fuzzy matching search for the best match instead of the next match

The ENHANCEMATCH flag is set using (?e) as in

regex.search("(?e)(dog){e<=1}", "cat and dog")[1] returns "dog"

but there's nothing on actually setting the BESTMATCH flag. How's it done?

like image 360
zelusp Avatar asked Apr 24 '16 01:04

zelusp


People also ask

How do I test fuzzy search?

Fuzzy searches help you find relevant results even when the search terms are misspelled. To perform a fuzzy search, append a tilde (~) at the end of the search term. For example the search term bank~ will return rows that contain tank , benk or banks .

How is fuzzy score calculated?

This topic describes how fuzzy scores are calculated when comparing two strings or two terms. The fuzzy search algorithm calculates a fuzzy score for each string comparison. The higher the score, the more similar the strings are. A score of 1.0 means the strings are identical.

Is fuzzy matching good?

Fuzzy string matching can help improve data quality and accuracy by data deduplication, identification of false-positives etc.


1 Answers

Documentation on the BESTMATCH flag functionality is partial (but improving). Poke-n-hope shows that BESTMATCH is set using (?b).

>>> import regex
>>> regex.search(r"(?e)(?:hello){e<=4}", "What did you say, oh - hello")[0]
'hat d'
>>> regex.search(r"(?b)(?:hello){e<=4}", "What did you say, oh - hello")[0]
'hello'
like image 161
zelusp Avatar answered Sep 22 '22 11:09

zelusp