When I use either the -P
or -O
alone with wget
, everything works as advertised.
$: wget -P "test" http://www.google.com/intl/en_com/images/srpr/logo3w.png
Saving to: `test/logo3w.png'
.
$: wget -O "google.png" http://www.google.com/intl/en_com/images/srpr/logo3w.png
2012-01-23 21:47:33 (1.20 MB/s) - `google.png' saved [7007/7007]
However, combining the two causes wget
to ignore -P
.
$: wget -P "test" -O "google.png" http://www.google.com/intl/en_com/images/srpr/logo3w.png
2012-01-23 21:47:51 (5.87 MB/s) - `google.png' saved [7007/7007]
I've set a variable for both the directory (generated by the last chunk of the URL) and the filename (generated through a counting loop) such that http://www.google.com/aaa/bbb/ccc
yields file
= /directory/filename
, or, for item 1, /ccc/000.jpg
When substituting this in to the code:Popen(['wget', '-O', file, theImg], stdout=PIPE, stderr=STDOUT)
wget
silently fails (on each iteration of the loop).
When I turn on debugging -d
and logging -a log.log
, each iteration printsDEBUG output created by Wget 1.13.4 on darwin10.8.0.
When I remove the -O
and file
, the operation proceeds normally.
My question is:
Is there a way to
A) Specify both -P
AND -O
in wget
(preferred) or
B) Insert a string to -O
containing /
-characters that doesn't cause it to fail?
Any help would be appreciated.
The first way to achieve our goal with wget is by using the options –no-host-directories (-nh) and –cut-dirs. -nh option disables the directories that are prefixed by the hostname. The second option –cut-dirs, on the other hand, specifies the number of directory components to be ignored.
Downloading a file to a specific directory When downloading a file, Wget stores it in the current directory by default. You can change that by using the -P option to specify the name of the directory where you want to save the file.
By default, invoking wget with -r http://fly.srk.fer.hr/ creates a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
Save with different file name By default, downloaded file will be saved with the last name mentioned in the URL. To save file with a different name option O can be used. Syntax: wget -O <fileName><URL>
Documentation of wget.download(..):
def download(url, out=None, bar=bar_adaptive):
"""High level function, which downloads URL into tmp file in current
directory and then renames it to filename autodetected from either URL
or HTTP headers.
:param bar: function to track download progress (visualize etc.)
:param out: output filename or directory
:return: filename where URL is downloaded to
"""
...
Use the following call to download file to a specific directory(already existing) with custom filename:
wget.download(url, path_to_output_file)
If you want a function call to abstract away the directory creation if already not existing, then use:
urllib.urlretrieve(url, path_to_output_file)
You should just pass dir/000.jpg
to -O
of wget
:
import subprocess
import os.path
subprocess.Popen(['wget', '-O', os.path.join(directory, filename), theImg])
It's not completely clear from your question whether you were already doing something similar to this, but if you were and it still failed, I can think of two reasons:
The argument to -O
contains a leading /
, making wget
fail because it doesn't have permission to randomly create directories in /
(root).
The directory you're telling wget
to write to doesn't exist. You can make sure it exists by creating it first using os.mkdir
in the Python standard library.
You can also try removing the arguments stdout=
and stderr=
from the Popen
call so you can see the errors directly, or print them using Python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With