Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup 4, findNext() function

I'm playing with BeautifulSoup 4 and I have this html code:

</tr>
          <tr>
<td id="freistoesse">Giraffe</td>
<td>14</td>
<td>7</td>
</tr>

I want to match both values between <td> tags so here 14 and 7.

I tried this:

giraffe = soup.find(text='Giraffe').findNext('td').text

but this only matches 14. How can I match both values with this function?

like image 289
nutship Avatar asked Apr 02 '13 18:04

nutship


People also ask

How do you use BeautifulSoup 4 in Python?

To use beautiful soup, you need to install it: $ pip install beautifulsoup4 . Beautiful Soup also relies on a parser, the default is lxml . You may already have it, but you should check (open IDLE and attempt to import lxml). If not, do: $ pip install lxml or $ apt-get install python-lxml .

What BeautifulSoup 4?

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

What is the difference between Find_all () and find () in BeautifulSoup?

find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.


1 Answers

Use find_all instead of findNext:

import bs4 as bs
content = '''\
<tr>
<td id="freistoesse">Giraffe</td>
<td>14</td>
<td>7</td>
</tr>'''
soup = bs.BeautifulSoup(content)

for td in soup.find('td', text='Giraffe').parent.find_all('td'):
    print(td.text)

yields

Giraffe
14
7

Or, you could use find_next_siblings (also known as fetchNextSiblings):

for td in soup.find(text='Giraffe').parent.find_next_siblings():
    print(td.text)

yields

14
7

Explanation:

Note that soup.find(text='Giraffe') returns a NavigableString.

In [30]: soup.find(text='Giraffe')
Out[30]: u'Giraffe'

To get the associated td tag, use

In [31]: soup.find('td', text='Giraffe')
Out[31]: <td id="freistoesse">Giraffe</td>

or

In [32]: soup.find(text='Giraffe').parent
Out[32]: <td id="freistoesse">Giraffe</td>

Once you have the td tag, you could use find_next_siblings:

In [35]: soup.find(text='Giraffe').parent.find_next_siblings()
Out[35]: [<td>14</td>, <td>7</td>]

PS. BeautifulSoup has added method names that use underscores instead of CamelCase. They do the same thing, but comform to the PEP8 style guide recommendations. Thus, prefer find_next_siblings over fetchNextSiblings.

like image 145
unutbu Avatar answered Sep 26 '22 14:09

unutbu