I am new to python and I've been trying to get links and inner text from this html code :
<div class="someclass">
<ul class="listing">
<li>
<a href="http://link1.com" title="">title1</a>
</li>
<li>
<a href="http://link2.com" title="">title2</a>
</li>
<li>
<a href="http://link3.com" title="">title3</a>
</li>
<li>
<a href="http://link4.com" title="">title4</a>
</li>
</ul>
</div>
I want only and all links from href http://link.com
and the inner text title
i tried this code
div = soup.find_all('ul',{'class':'listing'})
for li in div:
all_li = li.find_all('li')
for link in all_li.find_all('a'):
print(link.get('href'))
but no luck can someone help me
The problem is that you are using find_all
which returns a list in your second forloop where you should use find()
>>> for ul in soup.find_all('ul', class_='listing'):
... for li in ul.find_all('li'):
... a = li.find('a')
... print(a['href'], a.get_text())
...
http://link1.com title1
http://link2.com title2
http://link3.com title3
http://link4.com title4
You can also use a CSS selector instead of nested forloop
>>> for a in soup.select('.listing li a'):
... print(a['href'], a.get_text(strip=True))
...
http://link1.com title1
http://link2.com title2
http://link3.com title3
http://link4.com title4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With