Question
I am trying to remove style tags like <h2>
and <div class=...>
from my html file which is being downloaded by BeautifulSoup. I do want to keep what the tags contain (like text)
However this does not seem to work.
What i have tried
for url in urls:
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find("div", {"class": "product_specifications bottom_l js_readmore_content"})
print "<hr style='border-width:5px;'>"
for style in table.find_all('style'):
if 'style' in style.attrs:
del style.attrs['style']
print table
Urls i tried to work with
Python HTML parsing with beautiful soup and filtering stop words
Remove class attribute from HTML using Python and lxml
BeautifulSoup Tag Removal
You can use decompose()
:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#decompose
If you want to clear just text or keep element removed from tree, use clear
and extract
(description just above decompose).
You are looking for unwrap().
your_soup.tag.unwrap()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With