Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocess/multithreading use for concurrent file copy operation

I am writing python code to copy bunch of big files/folders from one location to other location on desktop (no network, everything is local). I am using shutil module for that.

But the problem is it takes more time so I want to speed up this copy process. I tried using threading and multiprocessing modules. But to my surprise both are taking more time than sequential code.

One more observation is - the time required increases with increase in number of processes for same number of folders. What I mean is suppose I have following directory structure

                       /a/a1, /a/a2, /b/b1 and /b/b2

If I create 2 processes to copy folders a and b the time taken is suppose 2 minutes. Now if I create 4 processes to copy folders a/a1, a/a2, b/b1 and b/b2 it takes around 4 minutes. I tried this only with multiprocessing not sure about threading.

I am not sure what is going. Would any one had similar problem? Are there any best practices which you can share to use multiprocessing/threading in python?

Thanks Abhijit

like image 618
Abhijit Sawant Avatar asked Feb 24 '23 00:02

Abhijit Sawant


1 Answers

Your problem is likely IO-bound, so more computational parallelization won't help. As you've seen, you're probably making the problem worse by now requiring the IO requests to jump back and forth between the various threads/processes. There are practical limits to how fast you can move data on a computer, especially where disk is involved.

like image 86
Joe Avatar answered Feb 25 '23 15:02

Joe