Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grab x number of words before and after a given keyword?

How can I go about grabbing [x] number of words before and after a given keyword in a string in PHP? I am trying to tokenize results from a mysql query tailored to the keyword as a snippet.

like image 683
Jaime Cross Avatar asked Sep 10 '10 12:09

Jaime Cross


1 Answers

$string = 'This is a test string to see how to grab words from an arbitrary sentence. It\'s a little hacky (as you can see from the results) - but generally speaking, it works.';

echo $string,'<br />';

function getWords($string,$word,$before=0,$after=0) {
    $stringWords = str_word_count($string,1);
    $myWordPos = array_search($word,$stringWords);

    if (($myWordPos-$before) < 0)
        $before = $myWordPos;
    return array_slice($stringWords,$myWordPos-$before,$before+$after+1);
}

var_dump(getWords($string,'test',2,1));
echo '<br />';
var_dump(getWords($string,'this',2,1));
echo '<br />';
var_dump(getWords($string,'sentence',1,3));
echo '<br />';
var_dump(getWords($string,'little',2,2));
echo '<br />';
var_dump(getWords($string,'you',2,2));
echo '<br />';
var_dump(getWords($string,'results',2,2));
echo '<br />';
var_dump(getWords($string,'works',2,2));

echo '<hr />';


function getWords2($string,$word,$before=0,$after=0) {
    $stringWords = str_word_count($string,1);
    $myWordPos = array_search($word,$stringWords);
    $stringWordsPos = array_keys(str_word_count($string,2));

    if (($myWordPos+$after) >= count($stringWords))
        $after = count($stringWords) - $myWordPos - 1;
    $startPos = $stringWordsPos[$myWordPos-$before];
    $endPos = $stringWordsPos[$myWordPos+$after] + strlen($stringWords[$myWordPos+$after]);

    return substr($string,$startPos,$endPos-$startPos);
}

echo '[',getWords2($string,'test',2,1),']<br />';
echo '[',getWords2($string,'this',2,1),']<br />';
echo '[',getWords2($string,'sentence',1,3),']<br />';
echo '[',getWords2($string,'little',2,2),']<br />';
echo '[',getWords2($string,'you',2,2),']<br />';
echo '[',getWords2($string,'results',2,2),']<br />';
echo '[',getWords2($string,'works',1,3),']<br />';

But what do you want to happen if the word appears multiple times? Or if the word doesn't appear in the string?

EDIT

Extended version of getWords2 to return up to a set number of occurrences of the keyword

$string = 'PHP is a widely-used general-purpose scripting language that is especially suited for Web development. The current version of PHP is 5.3.3, released on July 22, 2010. The online manual for PHP is an excellent resource for the language syntax and has an extensive list of the built-in and extension functions. Most extensions can be found in PECL. PEAR contains a plethora of community supplied classes. PHP is often paired with the MySQL relational database.';

echo $string,'<br />';

function getWords3($string,$word,$before=0,$after=0,$maxFoundCount=1) {
    $stringWords = str_word_count($string,1);
    $stringWordsPos = array_keys(str_word_count($string,2));

    $foundCount = 0;
    $foundInstances = array();
    while ($foundCount < $maxFoundCount) {
        if (($myWordPos = array_search($word,$stringWords)) === false)
            break;
        ++$foundCount;
        if (($myWordPos+$after) >= count($stringWords))
            $after = count($stringWords) - $myWordPos - 1;
        $startPos = $stringWordsPos[$myWordPos-$before];
        $endPos = $stringWordsPos[$myWordPos+$after] + strlen($stringWords[$myWordPos+$after]);

        $stringWords = array_slice($stringWords,$myWordPos+1);
        $stringWordsPos = array_slice($stringWordsPos,$myWordPos+1);

        $foundInstances[] = substr($string,$startPos,$endPos-$startPos);
    }
    return $foundInstances;
}

var_dump(getWords3($string,'PHP',2,2,3));
like image 158
Mark Baker Avatar answered Sep 19 '22 01:09

Mark Baker