Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python BeautifulSoup get all href in Children of div

I am new to python and I've been trying to get links and inner text from this html code :

<div class="someclass">
  <ul class="listing">
        <li>
          <a href="http://link1.com" title="">title1</a>
                </li>
        <li>
           <a href="http://link2.com" title="">title2</a>
                 </li>
        <li>
           <a href="http://link3.com" title="">title3</a>
                 </li>
        <li>
           <a href="http://link4.com" title="">title4</a>
                  </li>
  </ul>
</div>

I want only and all links from href http://link.com and the inner text title

i tried this code

    div = soup.find_all('ul',{'class':'listing'})
for li in div:
    all_li = li.find_all('li')
    for link in all_li.find_all('a'):
        print(link.get('href'))

but no luck can someone help me

like image 749
Aymen Derradji Avatar asked Mar 19 '16 21:03

Aymen Derradji


1 Answers

The problem is that you are using find_all which returns a list in your second forloop where you should use find()

>>> for ul in soup.find_all('ul', class_='listing'):
...     for li in ul.find_all('li'):
...         a = li.find('a')
...         print(a['href'], a.get_text())
... 
http://link1.com title1
http://link2.com title2
http://link3.com title3
http://link4.com title4

You can also use a CSS selector instead of nested forloop

>>> for a in soup.select('.listing li a'):
...     print(a['href'], a.get_text(strip=True))
... 
http://link1.com title1
http://link2.com title2
http://link3.com title3
http://link4.com title4
like image 106
styvane Avatar answered Oct 20 '22 15:10

styvane