BeautifulSoup: How to get nested divs

Question

Given the following code:

<html>
<body>
<div class="category1" id="foo">
      <div class="category2" id="bar">
            <div class="category3">
            </div>
            <div class="category4">
                 <div class="category5"> test
                 </div>
            </div>
      </div>
</div>
</body>
</html>

How to extract the word test from <div class="category5"> test using BeautifulSoup i.e how to deal with nested divs? I tried to lookup on the Internet but I didn't find any case that treat an easy to grasp example so I set up this one. Thanks.

Anzel · Accepted Answer

xpath should be the straight forward answer, however this is not supported in BeautifulSoup.

Updated: with a BeautifulSoup solution

To do so, given that you know the class and element (div) in this case, you can use a for/loop with attrs to get what you want:

from bs4 import BeautifulSoup

html = '''
<html>
<body>
<div class="category1" id="foo">
      <div class="category2" id="bar">
            <div class="category3">
            </div>
            <div class="category4">
                 <div class="category5"> test
                 </div>
            </div>
      </div>
</div>
</body>
</html>'''

content = BeautifulSoup(html)

for div in content.findAll('div', attrs={'class':'category5'}):
    print div.text

test

I have no problem extracting the text from your html sample, like @MartijnPieters suggested, you will need to find out why your div element is missing.

Another update

As you're missing lxml as a parser for BeautifulSoup, that's why None was returned as you haven't parsed anything to start with. Install lxml should solve your issue.

You may consider using lxml or similar which supports xpath, dead easy if you ask me.

from lxml import etree

tree = etree.fromstring(html) # or etree.parse from source
tree.xpath('.//div[@class="category5"]/text()')
[' test
                 ']

BeautifulSoup: How to get nested divs

Tags:

python

beautifulsoup

web-scraping

torr

1 Answers

Updated: with a BeautifulSoup solution

Another update

Anzel

Recent Activity

Donate For Us

BeautifulSoup: How to get nested divs

Tags:

python

beautifulsoup

web-scraping

torr

1 Answers

Updated: with a BeautifulSoup solution

Another update

Anzel

Related questions

Recent Activity

Donate For Us