I am writing python code to copy bunch of big files/folders from one location to other location on desktop (no network, everything is local). I am using shutil module for that.
But the problem is it takes more time so I want to speed up this copy process. I tried using threading and multiprocessing modules. But to my surprise both are taking more time than sequential code.
One more observation is - the time required increases with increase in number of processes for same number of folders. What I mean is suppose I have following directory structure
/a/a1, /a/a2, /b/b1 and /b/b2
If I create 2 processes to copy folders a and b the time taken is suppose 2 minutes. Now if I create 4 processes to copy folders a/a1, a/a2, b/b1 and b/b2 it takes around 4 minutes. I tried this only with multiprocessing not sure about threading.
I am not sure what is going. Would any one had similar problem? Are there any best practices which you can share to use multiprocessing/threading in python?
Thanks Abhijit
Your problem is likely IO-bound, so more computational parallelization won't help. As you've seen, you're probably making the problem worse by now requiring the IO requests to jump back and forth between the various threads/processes. There are practical limits to how fast you can move data on a computer, especially where disk is involved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With