Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Wget: Check for duplicate files and skip if it exists?

Tags:

python

wget

So I'm downloading files with WGET and I want to check if the file exsists before I download it. I know with the CLI version it has an option to: (see example).

# check if file exsists
# if not, download
wget.download(url, path)

With WGET it downloads the file without needing to name it. This is important because I don't want to rename the files when they already have a name.

If there is an alternative file downloading method that allows for checking for exsisting files please tell me! Thanks!!!

like image 323
aoeu Avatar asked Apr 04 '19 20:04

aoeu


3 Answers

wget.download() doesn't have any such option. The following workaround should do the trick for you:

import subprocess

url = "https://url/to/index.html"
path = "/path/to/save/your/files"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])

If the file is already there, you will get the following message:

File ‘index.html’ already there; not retrieving.

EDIT: If you are running this on Windows, you'd also have to include shell=True:

subprocess.run(["wget", "-r", "-nc", "-P", path, url], shell=True)
like image 190
Giorgos Myrianthous Avatar answered Oct 29 '22 21:10

Giorgos Myrianthous


I don't see that the python module has that option.

You could try to guess the filename that will be used (typically it will be the part of the url after the last slash character).

Or you could download the file to a new temporary directory and then check if that filename exists in your main directory.

like image 1
John Gordon Avatar answered Oct 29 '22 20:10

John Gordon


From the source code, the wget.download() function doesn't seem to have the option for additional parameters such as -nc or -N for skipping downloads if the file already exists. Only the CLI version seems to support this.

The function:

def download(url, out=None, bar=bar_adaptive):
    ...

You are only able to choose the url and the output directory

like image 1
nathancy Avatar answered Oct 29 '22 21:10

nathancy