Python Regex - find string between html tags [duplicate]

Question

I am trying to extract the string between Html tags. I can see that similar questions have been asked on stack overflow before, but I am completely new to python and I am struggling.

So if I have

<b>Bold Stuff</b>

I want to have a regular expression that leaves me with

Bold Stuff

But all of my solutions so far have left me with things like

>Bold Stuff<

I would really appreciate any help with this.

I had

>.*?<

And I have seen a question on stack overflow with suggested solution

>([^<>]*)<

But neither of these are working for me. Please could someone explain how to write a regex that says "find me the string between characters x and y not including x and y".

Thanks for any help

Remi Crystal · Accepted Answer

>>> a = '<b>Bold Stuff</b>'
>>> 
>>> import re
>>> re.findall(r'>(.+?)<', a)
['Bold Stuff']
>>> re.findall(r'>(.*?)<', a)[0] # non-greedy mode
'Bold Stuff'
>>> re.findall(r'>(.+?)<', a)[0] # or this, also is non-greedy mode
'Bold Stuff'
>>> re.findall(r'>(.*)<', a)[0] # greedy mode
'Bold Stuff'
>>>

At this point, both of greedy mode and non-greedy mode can work.

You're using the first non-greedy mode. Here is an example about what about non-greedy mode and greedy mode:

>>> a = '<b>Bold <br> Stuff</b>'
>>> re.findall(r'>(.*?)<', a)[0]
'Bold '
>>> re.findall(r'>(.*)<', a)[0]
'Bold <br> Stuff'
>>>

And here is about what is (...):

(...)

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group;

the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below.

To match the literals ( or ), use $ or $, or enclose them inside a character class: [(] [)].

Python Regex - find string between html tags [duplicate]

Tags:

python

html

regex

JungleBook

1 Answers

Remi Crystal

Recent Activity

Donate For Us

Python Regex - find string between html tags [duplicate]

Tags:

python

html

regex

JungleBook

1 Answers

Remi Crystal

Related questions

Recent Activity

Donate For Us