Removing certain tags with beautifulsoup and python

Question

Question

I am trying to remove style tags like <h2> and <div class=...> from my html file which is being downloaded by BeautifulSoup. I do want to keep what the tags contain (like text) However this does not seem to work.

What i have tried

for url in urls:
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    table = soup.find("div", {"class": "product_specifications bottom_l js_readmore_content"})
    print "<hr style='border-width:5px;'>"
    for style in table.find_all('style'):
        if 'style' in style.attrs:
            del style.attrs['style']
    print table

Urls i tried to work with

Python HTML parsing with beautiful soup and filtering stop words

Remove class attribute from HTML using Python and lxml

BeautifulSoup Tag Removal

m.wasowski · Accepted Answer

You can use decompose(): http://www.crummy.com/software/BeautifulSoup/bs4/doc/#decompose

If you want to clear just text or keep element removed from tree, use clear and extract (description just above decompose).

Bishwas Mishra · Answer

You are looking for unwrap().

your_soup.tag.unwrap()

Removing certain tags with beautifulsoup and python

Tags:

python

html

beautifulsoup

strip

user3671459

2 Answers

m.wasowski

Bishwas Mishra

Recent Activity

Donate For Us

Removing certain tags with beautifulsoup and python

Tags:

python

html

beautifulsoup

strip

user3671459

2 Answers

m.wasowski

Bishwas Mishra

Related questions

Recent Activity

Donate For Us