Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match last occurrence with regex

Tags:

python

regex

I would like to match last occurrence of a pattern using regex.

I have some text structured this way:

Pellentesque habitant morbi tristique senectus et netus et
lesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae
ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam
egestas <br>semper<br>tizi ouzou<br>Tizi Ouzou<br>                        

I want to match the last text between two <br> in my case <br>Tizi Ouzou<br>, ideally the Tizi Ouzou string

Note that there is some white spaces after the last <br>

I've tried this:

<br>.*<br>\s*$

but it selects everything starting from the first <br> to the last.

NB: I'm on python, and I'm using pythex to test my regex

like image 954
Ghilas BELHADJ Avatar asked Aug 24 '13 19:08

Ghilas BELHADJ


People also ask

What is negative lookahead regex?

The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead, we have the trivial regex u. Positive lookahead works just the same. q(?= u) matches a q that is followed by a u, without making the u part of the match.

What is a group in regex?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.

What does this regex do?

Short for regular expression, a regex is a string of text that lets you create patterns that help match, locate, and manage text. Perl is a great example of a programming language that utilizes regular expressions. However, its only one of the many places you can find regular expressions.


2 Answers

For me the clearest way is:

>>> re.findall('<br>(.*?)<br>', text)[-1]
'Tizi Ouzou'
like image 69
moliware Avatar answered Oct 06 '22 00:10

moliware


A non regex approach using the builtin str functions:

text = """
Pellentesque habitant morbi tristique senectus et netus et
lesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae
ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam
egestas <br>semper<br>tizi ouzou<br>Tizi Ouzou<br>       """

res = text.rsplit('<br>', 2)[-2]
#Tizi Ouzou
like image 33
Jon Clements Avatar answered Oct 06 '22 01:10

Jon Clements