Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to download thousand files using python? [closed]

I need to download a thousand csv files size: 20KB - 350KB. Here is my code so far:

Im using urllib.request.urlretrieve. And with it i download thousand files with size of all of them together: 250MB, for over an hour.

So my question is:

How can I download thousand csv files faster then one hour?

Thank you!

like image 401
Michael Avatar asked Dec 07 '13 12:12

Michael


2 Answers

Most likely the reason it takes so long is that it takes time to open a connection make the request, get the file and close the connection again.

A thousand files in an hour is 3.6 seconds per file, which is high, but the site you are downloading from may be slow.

The first thing to do is to use HTTP/2.0 and keep one conection open for all the files with Keep-Alive. The easiest way to do that is to use the Requests library, and use a session.

If this isn't fast enough, then you need to do several parallel downloads with either multiprocessing or threads.

like image 106
Lennart Regebro Avatar answered Nov 04 '22 08:11

Lennart Regebro


You should try using multithreading to download many files in parallel. Have a look at multiprocessing and especially the worker-pools.

like image 45
Juri Robl Avatar answered Nov 04 '22 07:11

Juri Robl