Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

algorithm for checking addresses for matches?

I'm working on a survey program where people will be given promotional considerations the first time they fill out a survey. In a lot of scenarios, the only way we can stop people from cheating the system and getting a promotion they don't deserve is to check street address strings against each other.

I was looking at using levenshtein distance to give me a number to measure similarity, and consider those below a certain threshold a duplicate.

However, if someone were looking to game the system, they could easily write "S 5th St" instead of "South Fifth Street", and levenshtein would consider those strings to be very different. So then I was thinking to convert all strings to a 'standard address form' i.e. 'South' becomes 's', 'Fifth' becomes '5th', etc.

Then I was thinking this is hopeless, and too much effort to get it working robustly. Is it?

I'm working with PHP/MySql, so I have the limitations inherent in that system.

like image 340
user151841 Avatar asked May 20 '10 16:05

user151841


1 Answers

I think your second idea is better than using Levenshtein distance. If you try to compare the addresses for similarity, then two different people who live nearby each other might accidentally "cheat" one another out of their prize. If I live at "S. 4th St." but my neighbor at "S. 5th St." already signed up, those two addresses might seem too similar by Lev distance.

You could reduce (but probably not eliminate) a lot of potential cheating by running addresses through a synonym normalizer. Before you check for equality, just convert

North -> N.
East -> E.
...
First -> 1st
Second -> 2nd
Third -> 3rd
...
Street -> St.
Avenue -> Ave.

The longer the list of synonyms you come up with, the better it will be at catching matches. It will be a little bit slower at processing, but addresses are tiny.

This is similar to converting strings to all lower (or upper) case before comparing them. (Which I also recommend, naturally.)

like image 86
Bill the Lizard Avatar answered Sep 17 '22 06:09

Bill the Lizard