I've PHP array something like this
$array = array("foo", "bar", "hallo", "world", "fooo", "bar1", "hall_o", "wor1ld", "foo", "bard", "hzallo", "w44orld");
I want to compare each element of an array with remaining elements.
Ex: I want to compre "foo" with "bar", "hallo", "world", "fooo", "bar1", "hall_o", "wor1ld", "foo", "bard", "hzallo" and "w44orld".
Then, I want to compre "bar" with "foo", "hallo", "world", "fooo", "bar1", "hall_o", "wor1ld", "foo", "bard", "hzallo", "w44orld"
and so on till last element.
Let's consider element, which we are comparing as $var_1 and variable for remaining elements as $var_2;
If similar_text($var_1, $var_2, $percent); returns $percent value > 90% then I want to print
$var_1 and all corresponding similar text values of $var_2 for which matching percentage > 90
Currently I'm planning to use two loops to achieve this, external loop for $var_1 and internal loop for $var_2 .
Each element of the array can have value upto 5000 characters and there can be 1000 elements in a array, so my current logic is very expensive.
Any direction to handle it in better way?
In order for the indexing to work, the array $arr must have unique values:
$arr = array("foo", "bar", "hallo", "world", "fooo", "bar1", "hall_o", "wor1ld", "bard", "hzallo", "w44orld");
$dexed = array();
foreach ($arr as $key => $value){
$dexed[$key]['val'] = $value;
$dexed[$key]['key'] = $key;
}
$out = array();//output
$rev = array();//reverse lookup array
$t = 80;//threshold value
$cnt = count($dexed);
$k = 0;
for ($i=0; $i<$cnt-1; $i++){
for ($j=$i+1; $j<$cnt; $j++){
//similar_text calculates differently depending on order of arguments
similar_text($dexed[$i]['val'], $dexed[$j]['val'], $percent1);
similar_text($dexed[$j]['val'], $dexed[$i]['val'], $percent2);
if (($percent1 >= $t) || ($percent2 >= $t)){
//check if value already exists under different key
if (in_array($dexed[$i]['val'], array_keys($rev))){
if ( ! in_array($dexed[$j]['val'], array_keys($rev))){
$fkey = $rev[$dexed[$i]['val']];//key found
$next = count($out[$fkey]);
$out[$fkey][$next]['val'] = $dexed[$j]['val'];
$out[$fkey][$next]['key'] = $dexed[$j]['key'];
$rev[$dexed[$j]['val']] = $fkey;
}
} else {
$out[$k][0]['val'] = $dexed[$i]['val'];
$out[$k][0]['key'] = $dexed[$i]['key'];
$out[$k][1]['val'] = $dexed[$j]['val'];
$out[$k][1]['key'] = $dexed[$j]['key'];
$rev[$dexed[$i]['val']] = $k;
$rev[$dexed[$j]['val']] = $k;
$k++;
}
}
}
}
Once $out is generated, use the following to generate an index array:
$index = array();
foreach ($out as $key => $group){
$cnt = count($group);
foreach ($group as $key2 => $word){
for ($i=0; $i<$cnt; $i++){
if ($i != $key2){
$index[$word['key']][] = $key.':'.$i;
}
}
}
}
Access all similar words for a given key (the key value for the word in the original array $arr);
$key = 2;
foreach ($index[$key] as $value){
$parts = explode(':', $value);
echo '<p>'.$out[$parts[0]][$parts[1]]['val'].'</p>';
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With