High memory usage only when multiprocessing

Question

I am trying to use python's multiprocessing library to hopefully gain some performance. Specifically I am using its map function. Now, for some reason when I swap it out with its single processed counterpart I don't get high memory usage. But using the multiprocessing version of map causes my memory to go through the roof. For the record I am doing something which can easily hog up loads of memory, but what would the difference be between the two to cause such a stark difference?

Thomas Wouters · Accepted Answer

You realize that multiprocessing does not use threads, yes? I say this because you mention a "single threaded counterpart".

Are you sending a lot of data through multiprocessing's map? A likely cause is the serialization multiprocessing has to do in many cases. multiprocessing uses pickle, which does typically take up more memory than the data it's pickling. (In some cases, specifically on systems with fork() where new processes are created when you call the map method, it can avoid the serialization, but whenever it needs to send new data to existing process it cannot do so.)

Since with multiprocessing all of the actual work is done in separate processes, the memory of your main process should not be affected by the actual operations you perform. The total use of memory does go up by quite a bit, however, because each worker process has a copy of the data you sent across. This is sometimes copy-on-write memory (in the same cases as not serializing) on systems that have CoW, but Python's use of memory is such that this quickly becomes written to, and thus copied.

High memory usage only when multiprocessing

Tags:

python

multiprocessing

Sandro

1 Answers

Thomas Wouters

Recent Activity

Donate For Us

High memory usage only when multiprocessing

Tags:

python

multiprocessing

Sandro

1 Answers

Thomas Wouters

Related questions

Recent Activity

Donate For Us