I'm playing with BeautifulSoup 4 and I have this html code:
</tr>
<tr>
<td id="freistoesse">Giraffe</td>
<td>14</td>
<td>7</td>
</tr>
I want to match both values between <td>
tags so here 14 and 7.
I tried this:
giraffe = soup.find(text='Giraffe').findNext('td').text
but this only matches 14
. How can I match both values with this function?
To use beautiful soup, you need to install it: $ pip install beautifulsoup4 . Beautiful Soup also relies on a parser, the default is lxml . You may already have it, but you should check (open IDLE and attempt to import lxml). If not, do: $ pip install lxml or $ apt-get install python-lxml .
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.
Use find_all
instead of findNext
:
import bs4 as bs
content = '''\
<tr>
<td id="freistoesse">Giraffe</td>
<td>14</td>
<td>7</td>
</tr>'''
soup = bs.BeautifulSoup(content)
for td in soup.find('td', text='Giraffe').parent.find_all('td'):
print(td.text)
yields
Giraffe
14
7
Or, you could use find_next_siblings
(also known as fetchNextSiblings
):
for td in soup.find(text='Giraffe').parent.find_next_siblings():
print(td.text)
yields
14
7
Explanation:
Note that soup.find(text='Giraffe')
returns a NavigableString.
In [30]: soup.find(text='Giraffe')
Out[30]: u'Giraffe'
To get the associated td
tag, use
In [31]: soup.find('td', text='Giraffe')
Out[31]: <td id="freistoesse">Giraffe</td>
or
In [32]: soup.find(text='Giraffe').parent
Out[32]: <td id="freistoesse">Giraffe</td>
Once you have the td
tag, you could use find_next_siblings
:
In [35]: soup.find(text='Giraffe').parent.find_next_siblings()
Out[35]: [<td>14</td>, <td>7</td>]
PS. BeautifulSoup has added method names that use underscores instead of CamelCase. They do the same thing, but comform to the PEP8 style guide recommendations. Thus, prefer find_next_siblings
over fetchNextSiblings
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With