Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup, findAll after findAll?

I'm pretty new to Python and mainly need it for getting information from websites. Here I tried to get the short headlines from the bottom of the website, but cant quite get them.

from bfs4 import BeautifulSoup
import requests

url = "http://some-website"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

nachrichten = soup.findAll('ul', {'class':'list'})

Now I would need another findAll to get all the links/a from the var "nachrichten", but how can I do this ?

like image 809
MusicPlay3r Avatar asked Dec 31 '25 14:12

MusicPlay3r


2 Answers

Use a css selector with select if you want all the links in a single list:

anchors = soup.select('ul.list a')

If you want individual lists:

anchors = [ ul.find_all(a) for a in soup.find_all('ul', {'class':'list'})]

Also if you want the hrefs you can make sure you only find the anchors with href attributes and extract:

hrefs = [a["href"] for a in soup.select('ul.list a[href]')]

With find_all set href=True i.e ul.find_all(a, href=True) .

like image 169
Padraic Cunningham Avatar answered Jan 03 '26 03:01

Padraic Cunningham


from bs4 import BeautifulSoup
import requests
url = "http://www.n-tv.de/ticker/"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
nachrichten = soup.findAll('ul', {'class':'list'})
links = []
for ul in nachrichten:
    links.extend(ul.findAll('a'))
print len(links)

Hope this solves your problem and I think the import is bs4. I never herd of bfs4

like image 40
Sandeep Avatar answered Jan 03 '26 03:01

Sandeep



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!