I have following image html and I am trying to parse information that is in alt. Currently I am able to successfully extract images.
html (What I currently parse
<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />
I construct the image name from what I parse:
Current Code
def main(url, output_folder="~/images"):
"""Download the images at url"""
soup = bs(urlopen(url))
parsed = list(urlparse.urlparse(url))
count = 0
for image in soup.findAll("img"):
print image
count += 1
print count
print "Image: %(src)s" % image
image_url = urlparse.urljoin(url, image['src'])
filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(")
parsed[2] = image["src"]
outpath = os.path.join(output_folder, filename)
urlretrieve(image_url, outpath)
What I would like to do is extract is
alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"
also I want to use alt data as the file name when I extract the image.
Inside your for
loop, you can obtain that by simply doing
image.get('alt', '')
This is explained in BeautifulSoup's documentation ("The attributes of Tags").
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With