I'm trying to scrape a page using BeatifulSoup
import urllib2
from bs4 import BeautifulSoup
url='http://www.xpn.org/playlists/xpn-playlist'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
for link in soup.find_all("li", class_="song"):
print link
The problem is the text I would like to return is not enclosed in it's own html tag
<li class="song"> <a href="/default.htm" onclick="return clickreturnvalue()
" onmouseout="delayhidemenu()" onmouseover="dropdownmenu(this, event, menu1,
'100px','Death Vessel','Mandan Dink','Stay Close')">Buy</a>
Chuck Ragan - Rotterdam - Folkadelphia Session</li>
What I want to return
Chuck Ragan - Rotterdam - Folkadelphia Session
Bonus Points: The data returned is of the format Artist/Song/Album. What would be the proper data structure to use to store and manipulate this info?
Try something like:
for link in soup.find_all("li", class_="song"):
print link.text
Output:
Buy Chuck Ragan - Rotterdam - Folkadelphia Session
Sure, if you want to remove Buy
, you can use slice
like this:
for link in soup.find_all("li", class_="song"):
print link.text.strip()[5:]
The output is:
Chuck Ragan - Rotterdam - Folkadelphia Session
If you'd like save these string in a list:
[i.strip() for i in link.text.strip()[5:].split('-')]
Output:
['Chuck Ragan', 'Rotterdam', 'Folkadelphia Session']
For more info, you can check the document.
Here is another way! (assuming li
has 3 children. If not, change [2]
to [1]
):
>>> html = '''<li class="song"> <a href="/default.htm" onclick="return clickreturnvalue()
... " onmouseout="delayhidemenu()" onmouseover="dropdownmenu(this, event, menu1,
... '100px','Death Vessel','Mandan Dink','Stay Close')">Buy</a>
... Chuck Ragan - Rotterdam - Folkadelphia Session</li>'''
>>> from bs4 import BeautifulSoup as bs
>>> all_li = soup.findAll('li', class_='song')
>>> for li in all_li:
... text = list(li.children)[2]
... artist, song, album = text.split('-')
... print artist, song, album
Chuck Ragan Rotterdam Folkadelphia Session
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With