Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegExp to Get N words before and after

I want to get the "context" of a given search string. For example, for search string myself in the following line

Me, my dog and myself are going on a vacation.

I want to get dog and myself are going for N=2. So 2 words before match and 2 after.


Currently I match whole lines like this:

$lines = file($file->getFilename());
$lines = preg_grep('/'.$_POST['query'].'/', $lines);
like image 545
c0dehunter Avatar asked Mar 19 '13 10:03

c0dehunter


People also ask

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

What is the difference between () and [] in regex?

In other words, square brackets match exactly one character. (a-z0-9) will match two characters, the first is one of abcdefghijklmnopqrstuvwxyz , the second is one of 0123456789 , just as if the parenthesis weren't there. The () will allow you to read exactly which characters were matched.

What does N mean in regex?

"\n" matches a newline character.

How do I match a pattern in regex?

Regular expressions, called regexes for short, are descriptions for a pattern of text. For example, a \d in a regex stands for a digit character — that is, any single numeral 0 to 9. Following regex is used in Python to match a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers.


1 Answers

preg_grep() is supposed to act like that, but it sounds like you would need preg_match() and in case you can have multiple instances of searched word in the text and want to find all of them preg_match_all()

The RegEx you're looking for is: (?:[^ ]+ ){0,2}myself(?: [^ ]+){0,2} Explained demo here: http://regex101.com/r/pB3eW0

I designed it to match 2 words before and after if it can otherwise 1 word or even none.

The code allowing a variable N could look like this:

$fileData=file_get_contents($file->getFilename());
$n=2;
$query='myself';
preg_match_all('/(?:[^ ]+ ){0,'.$n.'}'.$query.'(?: [^ ]+){0,'.$n.'}/i',$fileData,$matches);
print_r($matches);

Remember to validate and escape user input and not just use it in functions as given!

like image 150
CSᵠ Avatar answered Sep 23 '22 07:09

CSᵠ