I have a strong that I scraped from an XML file and It contains some HTML formatting tags
(<b>, <i>, etc)
Is there a quick and easy way to remove all of these tags from the text?
I tried
str = str.replace("<b>","")
and applied it several times to other tags, but that doesn't work
Using lxml.html:
lxml.html.fromstring(s).text_content()
This strips all tags and converts all entities to their corresponding characters.
Answer depends on your exact needs. You might have a look at regular expressions. But I would advise you to use http://www.crummy.com/software/BeautifulSoup/ if you want to clean up bad xml or html.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With