Python Wget: Check for duplicate files and skip if it exists?

Question

So I'm downloading files with WGET and I want to check if the file exsists before I download it. I know with the CLI version it has an option to: (see example).

# check if file exsists
# if not, download
wget.download(url, path)

With WGET it downloads the file without needing to name it. This is important because I don't want to rename the files when they already have a name.

If there is an alternative file downloading method that allows for checking for exsisting files please tell me! Thanks!!!

Giorgos Myrianthous · Accepted Answer

wget.download() doesn't have any such option. The following workaround should do the trick for you:

import subprocess

url = "https://url/to/index.html"
path = "/path/to/save/your/files"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])

If the file is already there, you will get the following message:

File ‘index.html’ already there; not retrieving.

EDIT: If you are running this on Windows, you'd also have to include shell=True:

subprocess.run(["wget", "-r", "-nc", "-P", path, url], shell=True)

John Gordon · Answer

I don't see that the python module has that option.

You could try to guess the filename that will be used (typically it will be the part of the url after the last slash character).

Or you could download the file to a new temporary directory and then check if that filename exists in your main directory.

nathancy · Answer

From the source code, the wget.download() function doesn't seem to have the option for additional parameters such as -nc or -N for skipping downloads if the file already exists. Only the CLI version seems to support this.

The function:

def download(url, out=None, bar=bar_adaptive):
    ...

You are only able to choose the url and the output directory

The function:

def download(url, out=None, bar=bar_adaptive):
    ...

You are only able to choose the url and the output directory

Python Wget: Check for duplicate files and skip if it exists?

Tags:

python

wget

aoeu

3 Answers

Giorgos Myrianthous

John Gordon

nathancy

Recent Activity

Donate For Us

Python Wget: Check for duplicate files and skip if it exists?

Tags:

python

wget

aoeu

3 Answers

Giorgos Myrianthous

John Gordon

nathancy

Related questions

Recent Activity

Donate For Us