What is the fastest way in PHP to take a keyword list and match it to a search result (like an array of titles) for all words?
For instance, if my keyword phrase is "great leather shoes", then the following titles would be a match...
...while these would not be a match:
I imagine there's some trick with array functions or a RegEx (Regular Expression) to achieve this rapidly.
I would use an index for the words in the titles and test if every search term is in that index:
$terms = explode(' ', 'great leather shoes');
$titles = array(
'Get Some Really Great Leather Shoes',
'Leather Shoes Are Great',
'Great Day! Those Are Some Cool Leather Shoes!',
'Shoes, Made of Leather, Can Be Great'
);
foreach ($titles as $title) {
// extract words in lowercase and use them as key for the word index
$wordIndex = array_flip(preg_split('/\P{L}+/u', mb_strtolower($title), -1, PREG_SPLIT_NO_EMPTY));
// look up if every search term is in the index
foreach ($terms as $term) {
if (!isset($wordIndex[$term])) {
// if one is missing, continue with the outer foreach
continue 2;
}
}
// echo matched title
echo "match: $title";
}
you can preg_grep() your array against something like
/^(?=.*?\bgreat)(?=.*?\bleather)(?=.*?\shoes)/
or (probably faster) grep each word separately and then array_intersect the results
It might be a pretty naive solution (quite possibly there are more efficient/elegant solutions), but I'ld probably do something like the following:
$keywords = array(
'great',
'leather',
'shoes'
);
$titles = array(
'Get Some Really Great Leather Shoes',
'Leather Shoes Are Great',
'Great Day! Those Are Some Cool Leather Shoes!',
'Shoes, Made of Leather, Can Be Great',
'Leather Shoes on Sale Today!',
'You\'ll Love These Leather Shoes Greatly',
'Great Shoes Don\'t Come Cheap'
);
$matches = array();
foreach( $titles as $title )
{
$wordsInTitle = preg_split( '~\b(\W+\b)?~', $title, null, PREG_SPLIT_NO_EMPTY );
if( array_uintersect( $keywords, $wordsInTitle, 'strcasecmp' ) == $keywords )
{
// we have a match
$matches[] = $title;
}
}
var_dump( $matches );
No idea how this benchmarks though.
You could use
/(?=.*?\great\b)(?=.*?\bshoes\b)(?=.*?\bleather\b)/
Note a couple of things
a)You need word boundaries at both ends else you could end up matching words that contain the ones you are looking for eg "shoes of leather bring greatness".
b)I use lazy wildcard match (i.e .*?). This improves effeciency, as by default * is greedy (i.e. it consumes as many characters as it can match, and only gives them up in favor of a overall match). So if we don't have the trailing ?, .* will match everything in the line and then backtrack to match 'great'. Same procedure is then repeated for 'shoes' and 'leather'. By making * lazy, we avoid these unnecessary backtracks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With