Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching dissimilar strings

Tags:

c#

algorithm

Assume two sets of strings:

[ "Mr. Jones", "O'Flaherty", "Bob", "Rob Jenkins" ]
[ "Maxwell O'Flaherty", "Robert Jenkins", "Mrs. Smith" ]

It is obvious that those two sets have Maxwell O'Flaherty and Robert Jenkins in common.

Is there any algorithm that will allow us to do such matching programatically? I am thinking of writing something that will go through each element in an array of strings and try to find any substring that is unique and not contained in any other element in either of the sets and then use that as a kind of hash of each element to match up the two sets.

like image 537
devprog Avatar asked Nov 14 '22 05:11

devprog


1 Answers

You may find the Levenshtein distance useful. If you are doing a lot of this where it is unclear how accurate the information is there are libraries for string disambiguation. (It's not "obvious" that Rob and Robert are identical - indeed the first one could be Robin.

like image 101
peter.murray.rust Avatar answered Dec 09 '22 22:12

peter.murray.rust