I am wondering how I can delete all HTML tags along with their contents using BeautifulSoup
.
Input:
... text <strong>ha</strong> ... text
Output:
... text ... text
Use replace_with()
(or replaceWith()
):
from bs4 import BeautifulSoup, Tag
text = "text <strong>ha</strong> ... text"
soup = BeautifulSoup(text)
for tag in soup.find_all('strong'):
tag.replaceWith('')
print soup.get_text()
prints:
text ... text
Or, as @mata suggested, you can use tag.decompose()
instead of tag.replaceWith('')
- will produce the same result, but looks more appropriate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With