Image scraping program in Python not functioning as intended

Question

My code only returns an empty string, and I have no idea why.

import urllib2

def getImage(url):
    page = urllib2.urlopen(url)
    page = page.read() #Gives HTML to parse

    start = page.find('<a img=')
    end = page.find('>', start)

    img = page[start:end]

return img

It would only return the first image it finds, so it's not a very good image scraper; that said, my primary goal right now is just to be able to find an image. I'm unable to.

tehmisvh · Accepted Answer

Consider using BeautifulSoup to parse your HTML:

from BeautifulSoup import BeautifulSoup
import urllib
url  = 'http://www.google.com'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
for img in soup.findAll('img'):
     print img['src']

bohney · Answer

You should use a library for this and there are several out there, but to answer your question by changing the code you showed us...

Your problem is that you are trying to find images, but images don't use the <a ...> tag. They use the <img ...> tag. Here is an example:

<img src="smiley.gif" alt="Smiley face" height="42" width="42">

What you should do is change your start = page.find('<a img=') line to start = page.find('<img ') like so:

def getImage(url):
    page = urllib2.urlopen(url)
    page = page.read() #Gives HTML to parse

    start = page.find('<img ')
    end = page.find('>', start)

    img = page[start:end+1]
    return img

Image scraping program in Python not functioning as intended

Tags:

python

image

user1753520

2 Answers

tehmisvh

bohney

Recent Activity

Donate For Us

Image scraping program in Python not functioning as intended

Tags:

python

image

user1753520

2 Answers

tehmisvh

bohney

Related questions

Recent Activity

Donate For Us