Regular Expression to remove html tags from a string in Python

Question

I am fetching my resut from a RSS feed using following code:

try:
    desc = item.xpath('description')[0].text
    if date is not None:
        desc =date +"
"+"
"+desc
except:
    desc = None

But sometimes the description contains html tags inside RSS feed as below:

This is samle text
< img src="http://imageURL" alt="" />

While displaying the content I do not want any HTML tags to be displayed on page. Is there any regular expression to remove the HTML tags.

pricco · Accepted Answer

Try:

pattern = re.compile(u'<\/?\w+\s*[^>]*?\/?>', re.DOTALL | re.MULTILINE | re.IGNORECASE | re.UNICODE)
text = pattern.sub(u" ", text)

Donate For Us