Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing all HTML tags along with their content from text

I am wondering how I can delete all HTML tags along with their contents using BeautifulSoup.

Input:

... text <strong>ha</strong> ... text

Output:

... text ... text
like image 201
Adam Silver Avatar asked Aug 26 '13 21:08

Adam Silver


Video Answer


1 Answers

Use replace_with() (or replaceWith()):

from bs4 import BeautifulSoup, Tag


text = "text <strong>ha</strong> ... text"

soup = BeautifulSoup(text)

for tag in soup.find_all('strong'):
    tag.replaceWith('')

print soup.get_text() 

prints:

text  ... text

Or, as @mata suggested, you can use tag.decompose() instead of tag.replaceWith('') - will produce the same result, but looks more appropriate.

like image 189
alecxe Avatar answered Nov 15 '22 17:11

alecxe