Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP regex for a word collection around a search phrase

Tags:

regex

php

Hi I am trying to create a regex that will do the following

grab 5 words before the search phrase (or x if there is only x words there) and 5 words after the search phrase (or x if there is only x words there) from a block of text (when I say words I mean words or numbers whatever is in the block of text)

eg

Welcome to Stack Overflow! Visit your user page to set your name and email.

if you was to search "visit" it would return: Welcome to Stack Overflow! Visit your user page to set

the idea is to use preg_match_all in php to give me a bunch of search results showing where in the text the search phrase appears for each occurrence of the search phrase.

Thanks in advance :D

on a sub note there may be a better way to get to my result if you feel there is please feel free to throw it in the pool as I'm not sure this is the best just the first way I thought of, to do what I need :D

like image 296
tom at zepsu dot com Avatar asked Jan 18 '23 12:01

tom at zepsu dot com


2 Answers

How about this:

(\S+\s+){0,5}\S*\bvisit\b\S*(\s+\S+){0,5}

will match five "words" (but accepting less if the text is shorter) before and after your search word (in this case visit).

preg_match_all(
    '/(\S+\s+){0,5} # Match five (or less) "words"
    \S*             # Match (if present) punctuation before the search term
    \b              # Assert position at the start of a word
    visit           # Match the search term
    \b              # Assert position at the end of a word
    \S*             # Match (if present) punctuation after the search term
    (\s+\S+){0,5}   # Match five (or less) "words"
    /ix', 
    $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];

I'm defining a "word" as a sequence of non-whitespace characters, separated by at least one whitespace.

The search words should be actual words (starting and ending with an alphanumeric character).

like image 58
Tim Pietzcker Avatar answered Feb 06 '23 03:02

Tim Pietzcker


You can do the folowing (it is a bit computation heavy, so it woudn't be efficient for very long strings):

<?php
$phrase = "Welcome to Stack Overflow! Visit your user page to set your name and email.";
$keyword = "Visit";
$lcWords = preg_split("/\s/", strtolower($phrase));
$words = preg_split("/\s/", $phrase);
$wordCount = 5;

$position = array_search(strtolower($keyword), $lcWords);
$indexBegin =  max(array($position - $wordCount, 0));
$len = min(array(count($words), $position - $indexBegin + $wordCount + 1));
echo join(" ", array_slice($words, $indexBegin, $len));
//prints: Welcome to Stack Overflow! Visit your user page to set

Codepad example here

like image 44
Benjamin Crouzier Avatar answered Feb 06 '23 03:02

Benjamin Crouzier