I need to download a thousand csv files size: 20KB - 350KB. Here is my code so far:
Im using urllib.request.urlretrieve
. And with it i download thousand files with size of all of them together: 250MB, for over an hour.
So my question is:
How can I download thousand csv files faster then one hour?
Thank you!
Most likely the reason it takes so long is that it takes time to open a connection make the request, get the file and close the connection again.
A thousand files in an hour is 3.6 seconds per file, which is high, but the site you are downloading from may be slow.
The first thing to do is to use HTTP/2.0 and keep one conection open for all the files with Keep-Alive. The easiest way to do that is to use the Requests library, and use a session.
If this isn't fast enough, then you need to do several parallel downloads with either multiprocessing or threads.
You should try using multithreading to download many files in parallel. Have a look at multiprocessing and especially the worker-pools.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With