Extract outer div using BeautifulSoup

Question

If the HTML code looks like this:

<div class="div1">
<p>hello</p>
<p>hi</p>
    <div class="nesteddiv">
        <p>one</p>
        <p>two</p>
        <p>three</p>
    </div>
</div>

How do I extract just

<div class="div1">
    <p>hello</p>
    <p>hi</p>
</div>

I already tried parser.find('div', 'div1') but I'm getting the whole div including the nested one.

Johnsyweb · Accepted Answer

You actually want to extract() the nested div from the document and then get the first div. Here is an example (where html is the HTML you provided in the question):

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(html)
>>> soup.div.div.extract()
<div class="nesteddiv">
<p>one</p>
<p>two</p>
<p>three</p>
</div>
>>> soup.div
<div class="div1">
<p>hello</p>
<p>hi</p>

</div>

Alexander Tsepkov · Answer

Why not just find() the nested div and then remove it from the tree using extract()?

Extract outer div using BeautifulSoup

Tags:

python

beautifulsoup

John Wine

2 Answers

Johnsyweb

Alexander Tsepkov

Recent Activity

Donate For Us

Extract outer div using BeautifulSoup

Tags:

python

beautifulsoup

John Wine

2 Answers

Johnsyweb

Alexander Tsepkov

Related questions

Recent Activity

Donate For Us