I have a piece of PHP code as follows:
$words = array(
'Art' => '1',
'Sport' => '2',
'Big Animals' => '3',
'World Cup' => '4',
'David Fincher' => '5',
'Torrentino' => '6',
'Shakes' => '7',
'William Shakespeare' => '8'
);
$text = "I like artists, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = $all_keys = array();
foreach ($words as $word => $key) {
if (strpos(strtolower($text), strtolower($word)) !== false) {
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
echo $keywords_list = implode(',', $all_keywords) ."<br>";
echo $keys_list = implode(',', $all_keys) . "<br>";
The code echos Art,Sport,World Cup,Shakes,William Shakespeare
and 1,2,4,7,8
; however, the code is very simple and is not accurate enough to echo the right keywords. For example, the code returns 'Shakes' => '7'
because of the Shakespeare
word in $text
, but as you can see, "Shakes" can not represent "Shakespeare" as a proper keyword. Basically I want to return Art,Sport,World Cup,William Shakespeare
and 1,2,4,8
instead of Art,Sport,World Cup,Shakes,William Shakespeare
and 1,2,4,7,8
. So, could you please help me how to develop a better code to extract the keywords without having similar problems? thanks for your help.
You may want to look at regular expressions to weed out partial matches:
// create regular expression by using alternation
// of all given words
$re = '/\b(?:' . join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, array_keys($words))) . ')\b/i';
preg_match_all($re, $text, $matches);
foreach ($matches[0] as $keyword) {
echo $keyword, " ", $words[$keyword], "\n";
}
The expression uses the \b
assertion to match word boundaries, i.e. the word must be on its own.
Output
World Cup 4
William Shakespeare 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With