I have am trying to parse a webpage that looks like this with Python->Beautiful Soup:
I am trying to extract the contents of the highlighted td div. Currently I can get all the divs by
alltd = soup.findAll('td') for td in alltd: print td
But I am trying to narrow the scope of that to search the tds in the class "tablebox" which still will probably return 30+ but is more managable a number than 300+.
How can I extract the contents of the highlighted td in picture above?
BeautifulSoup has a limited support for CSS selectors, but covers most commonly used ones. Use select() method to find multiple elements and select_one() to find a single element.
To get href with Python BeautifulSoup, we can use the find_all method. to create soup object with BeautifulSoup class called with the html string. Then we find the a elements with the href attribute returned by calling find_all with 'a' and href set to True .
It is useful to know that whatever elements BeautifulSoup finds within one element still have the same type as that parent element - that is, various methods can be called.
So this is somewhat working code for your example:
soup = BeautifulSoup(html) divTag = soup.find_all("div", {"class": "tablebox"}) for tag in divTag: tdTags = tag.find_all("td", {"class": "align-right"}) for tag in tdTags: print tag.text
This will print all the text of all the td
tags with the class of "align-right" that have a parent div
with the class of "tablebox".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With