Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Reverse" order regex - closest "above" match

Tags:

regex

php

this is example of some string.

<div>other text</div> some text abc , <div>need_match_this</div> bbbb <p>hsa</p> aa <span>hello</span>

I know only end of string "<span>hello</span>" and I need to match text in closest "above" div.

I used this regex:

\<div\>(.*?)\<\/div\>.*?\<span\>hello\<\/span\>

But this is not working for me because I need to return text of closest div only, not first div in string.

Is there any regex solution to resolve this?

Please help.

Thank you

like image 393
JohnJohnm1 Avatar asked Apr 15 '15 11:04

JohnJohnm1


1 Answers

You need to use a negative lookahead based regex instead of in-between .*?, since .*? would also match opening or closing div tags.

<div>((?:(?!<\/?div>).)*?)<\/div>(?:(?!<\/?div>).)*?<span>hello<\/span>

DEMO

(?:(?!<\/?div>).)*? forces the regex engine to match any character but not of <div> or </div>. That is, before matching each character, this regex would check for that particular character is not the starting character in <div> or </div> . If yes, then it would match that particular character. If no, match will fail abruptly and the following character won't be matched.

Example:

string - <div></div>

regex - <div>((?:(?!<\/?div>).)*?)<\/div>

For this input, the above mentioned regex would capture the in-between empty string (ie, the empty string exists between the opening and closing div tags). (?!<\/?div>). in the above would check for the following char must not be a starting char in <div> or </div> but this fails since the following char is < which is a staring char in </div>. Because we defined this particular regex to repeat zero or more times, (?:(?!<\/?div>).)*?, it captures the in-between empty string.

like image 89
Avinash Raj Avatar answered Nov 09 '22 08:11

Avinash Raj