Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading a image using Python Mechanize

I'm trying to write a Python script to download a image and set it as my wallpaper. Unfortunately, the Mechanize documentation is quite poor. My script is following the link correctly, but I'm having a hard time to actually save the image on my computer. From what I researched, the .retrieve() method should do the work, but How do I specify the path to where the file should be downloaded to? Here is what I have...

def followLink(browser, fixedLink):
    browser.open(fixedLink)

if browser.find_link(url_regex = r'1600x1200'):

    browser.follow_link(url_regex = r'1600x1200')

elif browser.find_link(url_regex = r'1400x1050'):

    browser.follow_link(url_regex = r'1400x1050')

elif browser.find_link(url_regex = r'1280x960'):

    browser.follow_link(url_regex = r'1280x960')

 return
like image 310
XVirtusX Avatar asked Dec 02 '22 23:12

XVirtusX


2 Answers

import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    data = browser.open(image['src']).read()
    browser.back()
    save = open(filename, 'wb')
    save.write(data)
    save.close()

This can help you download all the images from a web page. As for parsing html you'd better use BeautifulSoup or lxml. And download is just read the data and then write it to a local file. You should assign your own value to dir. It is where you images exist.

like image 54
zhangyangyu Avatar answered Dec 09 '22 16:12

zhangyangyu


Not sure why this solution hasn't come up, but you can use the mechanize.Browser.retrieve function as well. Perhaps this only works in newer versions of mechanize and has thus not been mentioned?

Anyway, if you wanted to shorten the answer by zhangyangyu, you could do this:

import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    browser.retrieve(image['src'], filename)
    browser.back()

Also keep in mind that you'll likely want to put all of this into a try except block like this one:

import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    try:
        browser.retrieve(image['src'], filename)
        browser.back()
    except (mechanize.HTTPError,mechanize.URLError) as e:
        pass
        # Use e.code and e.read() with HTTPError
        # Use e.reason.args with URLError

Of course you'll want to adjust this to your needs. Perhaps you want it to bomb out if it encounters an issue. It totally depends on what you want to achieve.

like image 21
0xC0000022L Avatar answered Dec 09 '22 16:12

0xC0000022L