Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

check if two address "might" be the same in php

Tags:

regex

php

I have this address:

Grimshaw Lane, Bollington, Macclesfield SK10 5JB,

Looking for this address, I obtain this (from an API):

Bollington Wharf, Grimshaw Lane, Bollington, United Kingdom

I know how work preg_match, but I believe there's must be anyway to compare two similars texts (similar, not the same), and decide if they are the same address (even if they are a little differents).

like image 238
francis Avatar asked Jan 15 '23 14:01

francis


2 Answers

There's obviously no solution that's going to get you 100% reliable results, but why not try this: Send both strings to Google Maps via wget and compare the results. Google has invested, at the least, tens of thousands of man-hours into solving the problem that you're looking at, why not just let them deal with it?

like image 193
AmericanUmlaut Avatar answered Jan 17 '23 04:01

AmericanUmlaut


I'm not sure if this helps, but I would consider using a combination of using explode to create multiple strings in an array an levenshtein() to compare the different elements of the array().

It depends on how many arrays you would have to compare, but if you just have a few (NOT thousands)

Psudo code would be something like this:

$search_address = "Bollington Wharf, Grimshaw Lane, Bollington, United Kingdom";

$my_addresses = Array("Grimshaw Lane, Bollington, Macclesfield SK10 5JB", 
                         "Different Lane, YabbaDabbaDoo, Otherfield SK12 6BJ", 
                         ...);
$search_array = explode(',', $search_address);

$best_address = array();
$lowest_compare_value = 1000;
$lowest_compare_address = array();
foreach($my_addresses as $key => $my_address) {
   $current_address_array = explode(',', $value);
   $compare_value = 0;

   foreach(<elements in $my_address>) {

      $lowest_value = 1000;      
      foreach(<elements in $search_array) {
          $new_value = levenshtein($search_element, $my_element);
          if ($new_value < $lowest_value) { $lowest_value = $new_value; }
      }
      $compare_value += $lowest_value;
   }
   if($compare_value < $lowest_compare_value) {
      $lowest_compare_value = $compare_value
      $lowest_compare_address = $my_address;
   }

}

Now you should also consider what maximum plausible levenshtein value could be to check if compared address is too far off.

As mentioned this method takes time and should NOT be used in an application that needs a lot of speed or if you have many local addresses.

like image 25
Bjørne Malmanger Avatar answered Jan 17 '23 04:01

Bjørne Malmanger