I have a string that contains html markup like links, bold text, etc.
I want to strip all the tags so I just have the raw text.
What's the best way to do this? regex?
If you are going to use regex:
import re def striphtml(data): p = re.compile(r'<.*?>') return p.sub('', data) >>> striphtml('<a href="foo.com" class="bar">I Want This <b>text!</b></a>') 'I Want This text!'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With