Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using python, Remove HTML tags/formatting from a string [duplicate]

Tags:

I have a string that contains html markup like links, bold text, etc.

I want to strip all the tags so I just have the raw text.

What's the best way to do this? regex?

like image 884
Blankman Avatar asked Aug 03 '10 17:08

Blankman


1 Answers

If you are going to use regex:

import re def striphtml(data):     p = re.compile(r'<.*?>')     return p.sub('', data)  >>> striphtml('<a href="foo.com" class="bar">I Want This <b>text!</b></a>') 'I Want This text!' 
like image 52
John Howard Avatar answered Oct 20 '22 09:10

John Howard