I am struggling with the syntax required to grab some hrefs in a td. The table, tr and td elements dont have any class's or id's.
If I wanted to grab the anchor in this example, what would I need?
< tr > < td > < a >...
Thanks
Step-by-step Approach. Step 1: The first step will be for scraping we need to import beautifulsoup module and get the request of the website we need to import the requests module. Step 2: The second step will be to request the URL call get method.
find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.
It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others. It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection.
As per the docs, you first make a parse tree:
import BeautifulSoup html = "<html><body><tr><td><a href='foo'/></td></tr></body></html>" soup = BeautifulSoup.BeautifulSoup(html)
and then you search in it, for example for <a>
tags whose immediate parent is a <td>
:
for ana in soup.findAll('a'): if ana.parent.name == 'td': print ana["href"]
Something like this?
from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) anchors = [td.find('a') for td in soup.findAll('td')]
That should find the first "a" inside each "td" in the html you provide. You can tweak td.find
to be more specific or else use findAll
if you have several links inside each td.
UPDATE: re Daniele's comment, if you want to make sure you don't have any None
's in the list, then you could modify the list comprehension thus:
from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) anchors = [a for a in (td.find('a') for td in soup.findAll('td')) if a]
Which basically just adds a check to see if you have an actual element returned by td.find('a')
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With