Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautiful soup getting the first child

Tags:

How can I get the first child?

 <div class="cities"> 
       <div id="3232"> London </div>
       <div id="131"> York </div>
  </div>

How can I get London?

for div in nsoup.find_all(class_='cities'):
    print (div.children.contents)

AttributeError: 'listiterator' object has no attribute 'contents'

like image 522
Emmet B Avatar asked Mar 19 '13 02:03

Emmet B


People also ask

How do you identify elements from BeautifulSoup?

BeautifulSoup has a limited support for CSS selectors, but covers most commonly used ones. Use select() method to find multiple elements and select_one() to find a single element.

Why is it named BeautifulSoup?

The poorly-formed stuff you saw on the Web was referred to as "tag soup", and only a web browser could parse it. Beautiful Soup started out as an HTML parser that would take tag soup and make it beautiful, or at least workable.


2 Answers

div.children returns an iterator.

for div in nsoup.find_all(class_='cities'):
    for childdiv in div.find_all('div'):
        print (childdiv.string) #london, york

AttributeError was raised, because of non-tags like '\n' are in .children. just use proper child selector to find the specific div.

(more edit) can't reproduce your exceptions - here's what I've done:

In [137]: print foo.prettify()
<div class="cities">
 <div id="3232">
  London
 </div>
 <div id="131">
  York
 </div>
</div>

In [138]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string
   .....: 
 London 
 York 

In [139]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string, childdiv['id']
   .....: 
 London  3232
 York  131
like image 78
thkang Avatar answered Sep 24 '22 06:09

thkang


With modern versions of bs4 (certainly bs4 4.7.1+) you have access to :first-child css pseudo selector. Nice and descriptive. Use soup.select_one if you only want to return the first match i.e. soup.select_one('.cities div:first-child').text. It is considered good practice to test is not None before using .text accessor.

from bs4 import BeautifulSoup as bs

html = '''
<div class="cities"> 
       <div id="3232"> London </div>
       <div id="131"> York </div>
  </div>
  '''
soup = bs(html, 'lxml') #or 'html.parser'
first_children = [i.text for i in soup.select('.cities div:first-child')]
print(first_children)
like image 41
QHarr Avatar answered Sep 25 '22 06:09

QHarr