If the HTML code looks like this:
<div class="div1">
<p>hello</p>
<p>hi</p>
<div class="nesteddiv">
<p>one</p>
<p>two</p>
<p>three</p>
</div>
</div>
How do I extract just
<div class="div1">
<p>hello</p>
<p>hi</p>
</div>
I already tried parser.find('div', 'div1') but I'm getting the whole div including the nested one.
You actually want to extract() the nested div from the document and then get the first div. Here is an example (where html is the HTML you provided in the question):
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(html)
>>> soup.div.div.extract()
<div class="nesteddiv">
<p>one</p>
<p>two</p>
<p>three</p>
</div>
>>> soup.div
<div class="div1">
<p>hello</p>
<p>hi</p>
</div>
Why not just find() the nested div and then remove it from the tree using extract()?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With