Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautifulsoup - How to get all links inside a block with a certain class?

Tags:

I have the following HTML Dom:

    <div class="meta-info meta-info-wide"> <div class="title">Разработчик</div> <div class="content contains-text-link"> 

<a class="dev-link" href="http://www.jourist.com&amp;sa=D&amp;usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">Перейти на веб-сайт</a>

 <a class="dev-link" href="mailto:[email protected]" rel="nofollow" target="_blank">Написать: [email protected]</a> 

 <div class="content physical-address">Diagonalstraße 41
    20537 Hamburg</div> </div> </div>

I need to get all links(url) with class dev-link inside block div.meta-info-wide.

I tried this obvious way, but does not work:

divTag = soup.find_all("div", {"class":"meta-info-wide"})
        print(len(divTag))

        for tag in divTag:
            tdTags = tag.find_all("a", {"class":"dev-link"})
            for tag in tdTags:
                print tag.text
like image 240
Hamama Avatar asked Dec 20 '16 08:12

Hamama


1 Answers

Try the following:

import bs4

html = """    
<div class="meta-info meta-info-wide"> <div class="title">Разработчик</div> <div class="content contains-text-link"> 
<a class="dev-link" href="http://www.jourist.com&amp;sa=D&amp;usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">Перейти на веб-сайт</a>
<a class="dev-link" href="mailto:[email protected]" rel="nofollow" target="_blank">Написать: [email protected]</a> 
<div class="content physical-address">Diagonalstraße 4120537 Hamburg</div> </div> </div>"""

soup = bs4.BeautifulSoup(html, "html.parser")

for div in soup.find_all("div", {"class":"meta-info-wide"}):
    for link in div.select("a.dev-link"):
        print link['href']

This gives you:

http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg
mailto:[email protected] 

The select() is used to return all a tags which have the class dev-link. This is the recommended method to use when there are two or more CSS classes involved.

Tested with BeautifulSoup 4.5.1, Python 2.7.12

like image 73
Martin Evans Avatar answered Sep 26 '22 16:09

Martin Evans