Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High memory usage only when multiprocessing

I am trying to use python's multiprocessing library to hopefully gain some performance. Specifically I am using its map function. Now, for some reason when I swap it out with its single processed counterpart I don't get high memory usage. But using the multiprocessing version of map causes my memory to go through the roof. For the record I am doing something which can easily hog up loads of memory, but what would the difference be between the two to cause such a stark difference?

like image 671
Sandro Avatar asked Oct 15 '22 06:10

Sandro


1 Answers

You realize that multiprocessing does not use threads, yes? I say this because you mention a "single threaded counterpart".

Are you sending a lot of data through multiprocessing's map? A likely cause is the serialization multiprocessing has to do in many cases. multiprocessing uses pickle, which does typically take up more memory than the data it's pickling. (In some cases, specifically on systems with fork() where new processes are created when you call the map method, it can avoid the serialization, but whenever it needs to send new data to existing process it cannot do so.)

Since with multiprocessing all of the actual work is done in separate processes, the memory of your main process should not be affected by the actual operations you perform. The total use of memory does go up by quite a bit, however, because each worker process has a copy of the data you sent across. This is sometimes copy-on-write memory (in the same cases as not serializing) on systems that have CoW, but Python's use of memory is such that this quickly becomes written to, and thus copied.

like image 75
Thomas Wouters Avatar answered Oct 18 '22 12:10

Thomas Wouters