Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

wget: How do I specify both --directory-prefix AND --output-document

Tags:

python

wget

When I use either the -P or -O alone with wget, everything works as advertised.

$: wget -P "test" http://www.google.com/intl/en_com/images/srpr/logo3w.png
Saving to: `test/logo3w.png'  

.

$: wget -O "google.png" http://www.google.com/intl/en_com/images/srpr/logo3w.png
2012-01-23 21:47:33 (1.20 MB/s) - `google.png' saved [7007/7007]

However, combining the two causes wget to ignore -P.

$: wget -P "test" -O "google.png" http://www.google.com/intl/en_com/images/srpr/logo3w.png
2012-01-23 21:47:51 (5.87 MB/s) - `google.png' saved [7007/7007]

I've set a variable for both the directory (generated by the last chunk of the URL) and the filename (generated through a counting loop) such that http://www.google.com/aaa/bbb/ccc yields file = /directory/filename, or, for item 1, /ccc/000.jpg

When substituting this in to the code:
Popen(['wget', '-O', file, theImg], stdout=PIPE, stderr=STDOUT)
wget silently fails (on each iteration of the loop).

When I turn on debugging -d and logging -a log.log, each iteration prints
DEBUG output created by Wget 1.13.4 on darwin10.8.0.

When I remove the -O and file, the operation proceeds normally.

My question is: Is there a way to
A) Specify both -P AND -O in wget (preferred) or
B) Insert a string to -O containing /-characters that doesn't cause it to fail?

Any help would be appreciated.

like image 959
Josh Whittington Avatar asked Jan 24 '12 06:01

Josh Whittington


People also ask

How do I download an entire folder using wget?

The first way to achieve our goal with wget is by using the options –no-host-directories (-nh) and –cut-dirs. -nh option disables the directories that are prefixed by the hostname. The second option –cut-dirs, on the other hand, specifies the number of directory components to be ignored.

How do you wget a file to a specific directory?

Downloading a file to a specific directory When downloading a file, Wget stores it in the current directory by default. You can change that by using the -P option to specify the name of the directory where you want to save the file.

Does wget create directory structure?

By default, invoking wget with -r http://fly.srk.fer.hr/ creates a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.

How do I specify a file name in wget?

Save with different file name By default, downloaded file will be saved with the last name mentioned in the URL. To save file with a different name option O can be used. Syntax: wget -O <fileName><URL>


2 Answers

Documentation of wget.download(..):

def download(url, out=None, bar=bar_adaptive):
    """High level function, which downloads URL into tmp file in current
    directory and then renames it to filename autodetected from either URL
    or HTTP headers.

    :param bar: function to track download progress (visualize etc.)
    :param out: output filename or directory
    :return:    filename where URL is downloaded to
    """
    ...

Use the following call to download file to a specific directory(already existing) with custom filename:

wget.download(url, path_to_output_file)

If you want a function call to abstract away the directory creation if already not existing, then use:

urllib.urlretrieve(url, path_to_output_file)
like image 56
Jaydev Avatar answered Oct 12 '22 11:10

Jaydev


You should just pass dir/000.jpg to -O of wget:

import subprocess
import os.path

subprocess.Popen(['wget', '-O', os.path.join(directory, filename), theImg])

It's not completely clear from your question whether you were already doing something similar to this, but if you were and it still failed, I can think of two reasons:

  • The argument to -O contains a leading /, making wget fail because it doesn't have permission to randomly create directories in / (root).

  • The directory you're telling wget to write to doesn't exist. You can make sure it exists by creating it first using os.mkdir in the Python standard library.

You can also try removing the arguments stdout= and stderr= from the Popen call so you can see the errors directly, or print them using Python.

like image 33
Rob Wouters Avatar answered Oct 12 '22 11:10

Rob Wouters