To get href with Python BeautifulSoup, we can use the find_all method. to create soup object with BeautifulSoup class called with the html string. Then we find the a elements with the href attribute returned by calling find_all with 'a' and href set to True .
Two ways to find all the anchor tags or href entries on the webpage are: soup. find_all() SoupStrainer class.
You can use find_all in the following way to find every a element that has an href attribute, and print each one:
from BeautifulSoup import BeautifulSoup
html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''
soup = BeautifulSoup(html)
for a in soup.find_all('a', href=True):
print "Found the URL:", a['href']
The output would be:
Found the URL: some_url
Found the URL: another_url
Note that if you're using an older version of BeautifulSoup (before version 4) the name of this method is findAll. In version 4, BeautifulSoup's method names were changed to be PEP 8 compliant, so you should use find_all instead.
If you want all tags with an href, you can omit the name parameter:
href_tags = soup.find_all(href=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With