I have the following HTML Dom:
<div class="meta-info meta-info-wide"> <div class="title">Разработчик</div> <div class="content contains-text-link">
<a class="dev-link" href="http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">Перейти на веб-сайт</a>
<a class="dev-link" href="mailto:[email protected]" rel="nofollow" target="_blank">Написать: [email protected]</a>
<div class="content physical-address">Diagonalstraße 41
20537 Hamburg</div> </div> </div>
I need to get all links(url) with class dev-link
inside block div.meta-info-wide
.
I tried this obvious way, but does not work:
divTag = soup.find_all("div", {"class":"meta-info-wide"})
print(len(divTag))
for tag in divTag:
tdTags = tag.find_all("a", {"class":"dev-link"})
for tag in tdTags:
print tag.text
Try the following:
import bs4
html = """
<div class="meta-info meta-info-wide"> <div class="title">Разработчик</div> <div class="content contains-text-link">
<a class="dev-link" href="http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">Перейти на веб-сайт</a>
<a class="dev-link" href="mailto:[email protected]" rel="nofollow" target="_blank">Написать: [email protected]</a>
<div class="content physical-address">Diagonalstraße 4120537 Hamburg</div> </div> </div>"""
soup = bs4.BeautifulSoup(html, "html.parser")
for div in soup.find_all("div", {"class":"meta-info-wide"}):
for link in div.select("a.dev-link"):
print link['href']
This gives you:
http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg
mailto:[email protected]
The select()
is used to return all a
tags which have the class dev-link
. This is the recommended method to use when there are two or more CSS classes involved.
Tested with BeautifulSoup 4.5.1, Python 2.7.12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With