Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding common prefix of array of strings

I have an array like this:

$sports = array(
'Softball - Counties',
'Softball - Eastern',
'Softball - North Harbour',
'Softball - South',
'Softball - Western'
);

I would like to find the longest common prefix of the string. In this instance, it would be 'Softball - '

I am thinking that I would follow this process

$i = 1;

// loop to the length of the first string
while ($i < strlen($sports[0]) {

  // grab the left most part up to i in length
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {

     if ($match != substr($sport, 0, $i) {
         // didn't match, return the part that did match
         return substr($sport, 0, $i-1);
     }

  } // foreach

   // increase string length
   $i++;
} // while

// if you got to here, then all of them must be identical

Questions

  1. Is there a built in function or much simpler way of doing this ?

  2. For my 5 line array that is probably fine, but if I were to do several thousand line arrays, there would be a lot of overhead, so I would have to be move calculated with my starting values of $i, eg $i = halfway of string, if it fails, then $i/2 until it works, then increment $i by 1 until we succeed. So that we are doing the least number of comparisons to get a result.

Is there a formula/algorithm out already out there for this kind of problem?

like image 348
bumperbox Avatar asked Aug 26 '09 17:08

bumperbox


2 Answers

I would use this:

$prefix = array_shift($array);  // take the first item as initial prefix
$length = strlen($prefix);
// compare the current prefix with the prefix of the same length of the other items
foreach ($array as $item) {
    // check if there is a match; if not, decrease the prefix by one character at a time
    while ($length && substr($item, 0, $length) !== $prefix) {
        $length--;
        $prefix = substr($prefix, 0, -1);
    }
    if (!$length) {
        break;
    }
}

Update   Here’s another solution, iteratively comparing each n-th character of the strings until a mismatch is found:

$pl = 0; // common prefix length
$n = count($array);
$l = strlen($array[0]);
while ($pl < $l) {
    $c = $array[0][$pl];
    for ($i=1; $i<$n; $i++) {
        if ($array[$i][$pl] !== $c) break 2;
    }
    $pl++;
}
$prefix = substr($array[0], 0, $pl);

This is even more efficient as there are only at most numberOfStrings‍·‍commonPrefixLength atomic comparisons.

like image 99
Gumbo Avatar answered Oct 01 '22 13:10

Gumbo


If you can sort your array, then there is a simple and very fast solution.

Simply compare the first item to the last one.

If the strings are sorted, any prefix common to all strings will be common to the sorted first and last strings.

sort($sport);

$s1 = $sport[0];               // First string
$s2 = $sport[count($sport)-1]; // Last string
$len = min(strlen($s1), strlen($s2));

// While we still have string to compare,
// if the indexed character is the same in both strings,
// increment the index. 
for ($i=0; $i<$len && $s1[$i]==$s2[$i]; $i++); 

$prefix = substr($s1, 0, $i);
like image 24
Gustav Bertram Avatar answered Oct 01 '22 12:10

Gustav Bertram