Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fuzzy DISTINCT Values

I have a database of real estate listings and need to return a list of neighborhoods. Right now I am using mysql DISTINCT which returns all of the distinct values. My probelm is that there is a lot of neighborhoods that have similar names: example:

Park View Sub 1
Park View
Park View Sub 2
Park View Sub 3
Great Lake Sub 1
Great Lake Sub 2
Great Lake 
Great Lake Sub 3

I am looking for an easy php or mysql solution that would recognize that "Park View" and "Great Lake" already exists and ONLY return "Park View" and "Great Lake".

My initial thought is to some how get the sort order by length so that the short values are at the top and then loop through using strstr. This sound like a large task I am wondering if there is a function either in mysql or php that would easily do this.

like image 555
user982853 Avatar asked Aug 28 '12 18:08

user982853


1 Answers

Here are some things you can try; presumably you're looking for both exact matches and close matches.

First look for an exact match. Then look for a LIKE match on the REVERSED name. Then look for the match with the fewest extra characters.

Here's a query that will do all that. Note that you will need to store the reversed place name in an indexed column if you want this to be efficient.

select name 
  from (
   select name, 0 ordinal
     from place 
    where name = 'Park View'
  union
  select name, 1 ordinal
    from place 
   where Reverse(Name) like concat(Reverse('Park View'),'%')
  union
  select name, 2+length(name)
    from place
   where name like concat('Park View','%')
 ) a 
order by ordinal
   limit 1

Notice how this UNION query uses ordinal to figure out the best match.

Check it out here: http://sqlfiddle.com/#!2/76a97/9/0

like image 169
O. Jones Avatar answered Oct 21 '22 05:10

O. Jones