<p>A lot of questions here with similar title but I'm trying to remove the tag from the soup object itself.</p> <p>I have a page that contains among other things this <code>div</code>: </p> <pre class="prettyprint"><code><div id="content"> I want to keep this<br /><div id="blah">I want to remove this</div> </div> </code></pre> <p>I can select <code><div id="content"></code> with <code>soup.find('div', id='content')</code> but I want to remove the <code><div id="blah"></code> from it.</p>

<p>You can use <strong><code>extract</code></strong> if you want to remove a tag or string from the tree.</p> <pre class="prettyprint"><code>In [13]: soup = BeautifulSoup("""<div id="content"> I want to keep this<br /><div id="blah">I want to remove this</div> </div>""") In [14]: soup = BeautifulSoup("""<div id="content"> ....: I want to keep this<br /><div id="blah">I want to remove this</div> ....: </div>""") In [15]: blah = soup.find(id='blah') In [16]: _ = blah.extract() In [17]: soup Out[17]: <html><body><div id="content"> I want to keep this<br/> </div></body></html> </code></pre>

<p>The <code>Tag.decompose</code> method removes <code>tag</code> from the tree. So find the <code>div</code> tag:</p> <pre class="prettyprint"><code>div = soup.find('div', {'id':'content'}) </code></pre> <p>Loop over all the children but the first:</p> <pre class="prettyprint"><code>for child in list(div)[1:]: </code></pre> <p>and try to decompose the children:</p> <pre class="prettyprint"><code> try: child.decompose() except AttributeError: pass </code></pre> <hr> <pre class="prettyprint"><code>import bs4 as bs content = '''<div id="content"> I want to keep this<br /><div id="blah">I want to remove this</div> </div>''' soup = bs.BeautifulSoup(content) div = soup.find('div', {'id':'content'}) for child in list(div)[1:]: try: child.decompose() except AttributeError: pass print(div) </code></pre> <p>yields</p> <pre class="prettyprint"><code><div id="content"> I want to keep this </div> </code></pre> <hr> <p>The equivalent using lxml would be</p> <pre class="prettyprint"><code>import lxml.html as LH content = '''<div id="content"> I want to keep this<br /><div id="blah">I want to remove this</div> </div>''' root = LH.fromstring(content) div = root.xpath('//div[@id="content"]')[0] for child in div: div.remove(child) print(LH.tostring(div)) </code></pre>

Remove tag from text with BeautifulSoup

A lot of questions here with similar title but I'm trying to remove the tag from the soup object itself.

I have a page that contains among other things this div:

<div id="content">
I want to keep this<br /><div id="blah">I want to remove this</div>
</div>

I can select <div id="content"> with soup.find('div', id='content') but I want to remove the <div id="blah"> from it.

How do I remove a tag from Beautiful Soup?

Beautiful Soup also allows for the removal of tags from the document. This is accomplished using the decompose() and extract() methods.

How do you remove HTML tags from text in Python?

Remove HTML tags from string in python Using the lxml Module The fromstring() method takes the original string as an input and returns a parser. After getting the parser, we can extract the text using the text_content() method, leaving behind the HTML tags. The text_content() method returns an object of lxml. etree.

How do you delete a tag in Python?

For this, decompose() method is used which comes built into the module. Tag. decompose() removes a tag from the tree of a given HTML document, then completely destroys it and its contents.

You can use extract if you want to remove a tag or string from the tree.

In [13]: soup = BeautifulSoup("""<div id="content">
I want to keep this<br /><div id="blah">I want to remove this</div>
</div>""")

In [14]: soup = BeautifulSoup("""<div id="content">
   ....: I want to keep this<br /><div id="blah">I want to remove this</div>
   ....: </div>""")

In [15]: blah = soup.find(id='blah')

In [16]: _ = blah.extract()

In [17]: soup
Out[17]: 
<html><body><div id="content">
I want to keep this<br/>
</div></body></html>

The Tag.decompose method removes tag from the tree. So find the div tag:

div = soup.find('div', {'id':'content'})

Loop over all the children but the first:

for child in list(div)[1:]:

and try to decompose the children:

    try:
        child.decompose()
    except AttributeError: pass

import bs4 as bs

content = '''<div id="content">
I want to keep this<br /><div id="blah">I want to remove this</div>
</div>'''
soup = bs.BeautifulSoup(content)
div = soup.find('div', {'id':'content'})
for child in list(div)[1:]:
    try:
        child.decompose()
    except AttributeError: pass
print(div)

yields

<div id="content">
I want to keep this
</div>

The equivalent using lxml would be

import lxml.html as LH

content = '''<div id="content">
I want to keep this<br /><div id="blah">I want to remove this</div>
</div>'''
root = LH.fromstring(content)

div = root.xpath('//div[@id="content"]')[0]
for child in div:
    div.remove(child)
print(LH.tostring(div))

Remove tag from text with BeautifulSoup

Tags:

python

html

beautifulsoup

Juicy

People also ask

2 Answers

styvane

unutbu

Recent Activity

Donate For Us

Remove tag from text with BeautifulSoup

Tags:

python

html

beautifulsoup

Juicy

People also ask

2 Answers

styvane

unutbu

Related questions

Recent Activity

Donate For Us