Im trying to parse a list of video game titles from a shopping site. however as the item list is all stored inside a tag .
This section of the documentation supposedly explains how to parse only part of the document but i cant work it out. my code:
from BeautifulSoup import BeautifulSoup
import urllib
import re
url = "Some Shopping Site"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
for a in soup.findAll('a',{'title':re.compile('.+') }):
print a.string
at present is prints the string inside any tag that has a not empty title reference. but it is also priting the items in the side bar that are the "specials". if i can only take the product list div, i will kill 2 birds with one stone.
Many thanks.
Oh boy am i silly, i was searching for tags with atribute id = products, but it should have been product_list
heres the finaly code if anyone comes searching.
from BeautifulSoup import BeautifulSoup, SoupStrainer
import urllib
import re
start = time.clock()
url = "http://someplace.com"
html = urllib.urlopen(url).read()
product = SoupStrainer('div',{'id': 'products_list'})
soup = BeautifulSoup(html,parseOnlyThese=product)
for a in soup.findAll('a',{'title':re.compile('.+') }):
print a.string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With