To get href with Python BeautifulSoup, we can use the find_all method. to create soup object with BeautifulSoup class called with the html string. Then we find the a elements with the href attribute returned by calling find_all with 'a' and href set to True .
Two ways to find all the anchor tags or href entries on the webpage are: soup. find_all() SoupStrainer class.
You can use find_all
in the following way to find every a
element that has an href
attribute, and print each one:
from BeautifulSoup import BeautifulSoup
html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''
soup = BeautifulSoup(html)
for a in soup.find_all('a', href=True):
print "Found the URL:", a['href']
The output would be:
Found the URL: some_url
Found the URL: another_url
Note that if you're using an older version of BeautifulSoup (before version 4) the name of this method is findAll
. In version 4, BeautifulSoup's method names were changed to be PEP 8 compliant, so you should use find_all
instead.
If you want all tags with an href
, you can omit the name
parameter:
href_tags = soup.find_all(href=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With