I am trying to scrape the people who have birthdays from this Wikipedia page
Here is the existing code:
hdr = {'User-Agent': 'Mozilla/5.0'}
site = "http://en.wikipedia.org/wiki/"+"january"+"_"+"1"
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
print soup
This all works fine and I get the entire HTML page, but I want specific data, and I don't know how to access that with Beautiful Soup without an id to use. The <ul> tag does not have an id and neither do the <li> tags. Plus, I can't just ask for every <li> tag because there are other lists on the page. Is there a specific way to call a given list? (I can't just use a fix for this one page because I plan on iterating through all the dates and getting every pages birthday, and I can't guarentee that every page is the exact same layout as this one).
The idea is to get the span with Births id, find parent's next sibling (which is ul) and iterate over it's li elements. Here's a complete example using requests (it's not relevant though):
from bs4 import BeautifulSoup as Soup, Tag
import requests
response = requests.get("http://en.wikipedia.org/wiki/January_1")
soup = Soup(response.content)
births_span = soup.find("span", {"id": "Births"})
births_ul = births_span.parent.find_next_sibling()
for item in births_ul.findAll('li'):
if isinstance(item, Tag):
print item.text
prints:
871 – Zwentibold, Frankish son of Arnulf of Carinthia (d. 900)
1431 – Pope Alexander VI (d. 1503)
1449 – Lorenzo de' Medici, Italian politician (d. 1492)
1467 – Sigismund I the Old, Polish king (d. 1548)
1484 – Huldrych Zwingli, Swiss pastor and theologian (d. 1531)
1511 – Henry, Duke of Cornwall (d. 1511)
1516 – Margaret Leijonhufvud, Swedish wife of Gustav I of Sweden (d. 1551)
...
Hope that helps.
Find the Births section:
section = soup.find('span', id='Births').parent
And then find the next unordered list:
births = section.find_next('ul').find_all('li')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With