Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python download all files from internet address?

I want to download all files from an internet page, actually all the image files. I found the 'urllib' module to be what I need. There seems to be a method to download a file, if you know the filename, but I don't.

urllib.urlretrieve('http://www.example.com/page', 'myfile.jpg')

Is there a method to download all the files from the page and maybe return a list?

like image 252
Brock123 Avatar asked Oct 01 '11 07:10

Brock123


1 Answers

Here's a little example to get you started with using BeautifulSoup for this kind of exercise - you give this script a URL, and it will print out the URLs of images that are referenced from that page in the src attribute of img tags that end with jpg or png:

import sys, urllib, re, urlparse
from BeautifulSoup import BeautifulSoup

if not len(sys.argv) == 2:
    print >> sys.stderr, "Usage: %s <URL>" % (sys.argv[0],)
    sys.exit(1)

url = sys.argv[1]

f = urllib.urlopen(url)
soup = BeautifulSoup(f)
for i in soup.findAll('img', attrs={'src': re.compile('(?i)(jpg|png)$')}):
    full_url = urlparse.urljoin(url, i['src'])
    print "image URL: ", full_url

Then you can use urllib.urlretrieve to download each of the images pointed to by full_url, but at that stage you have to decide how to name them and what to do with the downloaded images, which isn't specified in your question.

like image 69
Mark Longair Avatar answered Oct 20 '22 13:10

Mark Longair