My code only returns an empty string, and I have no idea why.
import urllib2
def getImage(url):
page = urllib2.urlopen(url)
page = page.read() #Gives HTML to parse
start = page.find('<a img=')
end = page.find('>', start)
img = page[start:end]
return img
It would only return the first image it finds, so it's not a very good image scraper; that said, my primary goal right now is just to be able to find an image. I'm unable to.
Consider using BeautifulSoup to parse your HTML:
from BeautifulSoup import BeautifulSoup
import urllib
url = 'http://www.google.com'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
for img in soup.findAll('img'):
print img['src']
You should use a library for this and there are several out there, but to answer your question by changing the code you showed us...
Your problem is that you are trying to find images, but images don't use the <a ...>
tag. They use the <img ...>
tag. Here is an example:
<img src="smiley.gif" alt="Smiley face" height="42" width="42">
What you should do is change your start = page.find('<a img=')
line to start = page.find('<img ')
like so:
def getImage(url):
page = urllib2.urlopen(url)
page = page.read() #Gives HTML to parse
start = page.find('<img ')
end = page.find('>', start)
img = page[start:end+1]
return img
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With