I'm playing about with a stop words within my code I have an array full of words that I'd like to check, and an array of words I want to check against.
At the moment I'm looping through the array one at at a time and removing the word if its in_array vs the stop word list but I wonder if there's a better way of doing it, I've looked at array_diff and such however if I have multiple stop words in the first array, array_diff only appears to remove the first occurrence.
The focus is on speed and memory usage but speed more so.
Edit -
The first array is singular words, based on blog comments (these are usually quite long) the second array is singular words of stop words. Sorry for not making that clear
Thanks
A simple approach is to use str_replace or str_ireplace, which can take an array of 'needles' (things to search for), corresponding replacements, and an array of 'haystacks' (things to operate on).
$haystacks=array(
"The quick brown fox",
"jumps over the ",
"lazy dog"
);
$needles=array(
"the", "lazy", "quick"
);
$result=str_ireplace($needles, "", $haystacks);
var_dump($result);
This produces
array(3) {
[0]=>
string(11) " brown fox"
[1]=>
string(12) "jumps over "
[2]=>
string(4) " dog"
}
As an aside, a quick way to clean up the trailing spaces this leaves would be to use array_map to call trim for each element
$result=array_map("trim", $result);
The drawback of using str_replace is that it will replace matches found within words, rather than just whole words. To address that, we can use regular expressions...
An approach using preg_replace looks very similar to the above, but the needles are regular expressions, and we check for a 'word boundary' at the start and end of the match using \b
$haystacks=array(
"For we shall use fortran to",
"fortify the general theme",
"of this torrent of nonsense"
);
$needles=array(
'/\bfor\b/i',
'/\bthe\b/i',
'/\bto\b/i',
'/\bof\b/i'
);
$result=preg_replace($needles, "", $haystacks);
array_diff() should work.
$sentence = "the quick brown fox jumps the fence and runs";
$array = explode(" ", $sentence);
$stopwords = array("the","and","an","of");
print_r(array_diff($array,$stopwords));
Result
Array
(
[1] => quick
[2] => brown
[3] => fox
[4] => jumps
[6] => fence
[8] => runs
)
I tested on this site: http://sandbox.onlinephpfunctions.com/
If you already have two sorted arrays, you can use this algorithm to remove each element from array A that is also in array B (in mathematical terms: A \ B):
for ($i=0, $n=count($a), $j=0, $m=count($b); $i<$n && $j<$m; ) {
$diff = strcmp($a[$i], $b[$j]);
if ($diff == 0) {
unset($a[$i]);
$i++;
}
if ($diff < 0) {
$i++;
}
if ($diff > 0) {
$j++;
}
}
This does only require O(n) steps.
Another approach would be to use the words of array B as keys for an index (using array_flip
), iterate the values of A and see if they are a key in the index using array_key_exists
:
$index = array_flip($b);
foreach ($a as $key => $val) {
if (array_key_exists($val, $b)) {
unset($a[$key]);
}
}
Again, this is O(n) as it avoids looking up each value in B for each value in A that would be O(n2).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With