Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory not freed after Python's multiprocessing Pool is finished

When having used Python's multiprocessing Pool.map(), I do not get back my memory. Over 1GB of memory is still occupied, although the function with the Pool is exited, everything is closed, and I even try to delete the variable of the Pool and explicitly call the garbage collector.

When, in the code shown below, un-commenting the two lines above the pool.map() (and commenting the pool.map() line) everything looks OK, but as soon as using multiprocessing the memory seems not to get freed again after leaving the function.

Because in the real world code several other functions using multiprocessing are called, this then even stacks up, consuming all the memory.
(Unfortunately I can not produce a minimal example for the minor second case, with stacking up the memory, but as soon as the main problem is solved this second one should be gone also.)

This is Python 3.7.3 on Linux and any help for at least explaining or even solving this issue is very welcome.

Minimal example code:

import gc
from time import sleep
from memory_profiler import profile
import numpy as np

def waitat(where, t):
    # print and wait, gives chance to see live memory usage in some task manager program
    print(where)
    sleep(t)

@profile
def parallel_convert_all_to_hsv(imgs: np.ndarray) -> np.ndarray:
    from skimage.color import rgb2hsv
    import multiprocessing as mp
    print("going parallel")
    pool = mp.Pool()
    try:
        # images_converted = [] # there is no memory problem when using commented lines below, instead of pool.map(…) line
        # for img in imgs:
        #     images_converted.append(rgb2hsv(img))
        images_converted = pool.map(rgb2hsv, imgs)
    except KeyboardInterrupt:
        pool.terminate()
    waitat("after pool.map",5)

    pool.close()
    pool.join()

    waitat("before del pool",5)
    pool = None
    del pool    # memory should now be freed here?
    mp = None
    rgb2hsv = None

    waitat("after del pool",5)
    print("copying over")
    res = np.array(images_converted)
    waitat("before del image_hsv in function",5)
    images_converted = None
    del images_converted
    return res

@profile
def doit():
    print("create random images")
    max_images = 700
    images = np.random.rand(max_images, 300, 300,3)

    waitat("before going parallel",5)
    images_converted = parallel_convert_all_to_hsv(images)
    print("images_converted has %i bytes" % images_converted.nbytes)
    # how to clean up Pool's memory at latest here?

    waitat("before deleting original images",5)
    images = None
    del images
    waitat("memory should be as before going parallel + %i bytes" % images_converted.nbytes ,10)
    images_converted = None
    del images_converted
    waitat("nearly end, memory should be as before" ,15)
    gc.collect(2)
    waitat("end, memory should be as before" ,15)    

doit()

Output with using Memory Profiler, showing the problem:

$ python3 -m memory_profiler pool-mem-probs.py
create random images
before going parallel
going parallel
after pool.map
before del pool
after del pool
copying over
before del image_hsv in function
Filename: pool-mem-probs.py

Line #    Mem usage    Increment   Line Contents
================================================
    11   1481.2 MiB   1481.2 MiB   @profile
    12                             def parallel_convert_all_to_hsv(imgs: np.ndarray) -> np.ndarray:
    13   1487.2 MiB      6.0 MiB       from skimage.color import rgb2hsv
    14   1487.2 MiB      0.0 MiB       import multiprocessing as mp
    15   1487.2 MiB      0.0 MiB       print("going parallel")
    16   1488.6 MiB      1.4 MiB       pool = mp.Pool()
    17   1488.6 MiB      0.0 MiB       try:
    18                                     # images_converted = []  # there is no memory problem when using commented lines below, instead of pool.map(…) line
    19                                     # for img in imgs:
    20                                     #     images_converted.append(rgb2hsv(img))
    21   2930.9 MiB   1442.3 MiB           images_converted = pool.map(rgb2hsv, imgs)
    22                                 except KeyboardInterrupt:
    23                                     pool.terminate()
    24   2930.9 MiB      0.0 MiB       waitat("after pool.map",5)
    25                                 
    26   2930.9 MiB      0.0 MiB       pool.close()
    27   2931.0 MiB      0.1 MiB       pool.join()
    28                                 
    29   2931.0 MiB      0.0 MiB       waitat("before del pool",5)
    30   2931.0 MiB      0.0 MiB       pool = None
    31   2931.0 MiB      0.0 MiB       del pool    # memory should now be freed here?
    32   2931.0 MiB      0.0 MiB       mp = None
    33   2931.0 MiB      0.0 MiB       rgb2hsv = None
    34                                 
    35   2931.0 MiB      0.0 MiB       waitat("after del pool",5)
    36   2931.0 MiB      0.0 MiB       print("copying over")
    37   4373.0 MiB   1441.9 MiB       res = np.array(images_converted)
    38   4373.0 MiB      0.0 MiB       waitat("before del image_hsv in function",5)
    39   4016.6 MiB      0.0 MiB       images_converted = None
    40   4016.6 MiB      0.0 MiB       del images_converted
    41   4016.6 MiB      0.0 MiB       return res


images_converted has 1512000000 bytes
before deleting original images
memory should be as before going parallel + 1512000000 bytes
nearly end, memory should be as before
end, memory should be as before
Filename: pool-mem-probs.py

Line #    Mem usage    Increment   Line Contents
================================================
    43     39.1 MiB     39.1 MiB   @profile
    44                             def doit():
    45     39.1 MiB      0.0 MiB       print("create random images")
    46     39.1 MiB      0.0 MiB       max_images = 700
    47   1481.2 MiB   1442.1 MiB       images = np.random.rand(max_images, 300, 300,3)
    48                             
    49   1481.2 MiB      0.0 MiB       waitat("before going parallel",5)
    50   4016.6 MiB   2535.4 MiB       images_converted = parallel_convert_all_to_hsv(images)
    51   4016.6 MiB      0.0 MiB       print("images_converted has %i bytes" % images_converted.nbytes)
    52                                 # how to clean up Pool's memory at latest here?
    53                             
    54   4016.6 MiB      0.0 MiB       waitat("before deleting original images",5)
    55   2574.6 MiB      0.0 MiB       images = None
    56   2574.6 MiB      0.0 MiB       del images
    57   2574.6 MiB      0.0 MiB       waitat("memory should be as before going parallel + %i bytes" % images_converted.nbytes ,10)
    58   1132.7 MiB      0.0 MiB       images_converted = None
    59   1132.7 MiB      0.0 MiB       del images_converted
    60   1132.7 MiB      0.0 MiB       waitat("nearly end, memory should be as before" ,15)
    61   1132.7 MiB      0.0 MiB       gc.collect(2)
    62   1132.7 MiB      0.0 MiB       waitat("end, memory should be as before" ,15)    

Output of non-parallel code (where the problem does not occur):

$ python3 -m memory_profiler pool-mem-probs.py
create random images
before going parallel
going parallel
after pool.map
before del pool
after del pool
copying over
before del image_hsv in function
Filename: pool-mem-probs.py

Line #    Mem usage    Increment   Line Contents
================================================
    11   1481.3 MiB   1481.3 MiB   @profile
    12                             def parallel_convert_all_to_hsv(imgs: np.ndarray) -> np.ndarray:
    13   1488.1 MiB      6.8 MiB       from skimage.color import rgb2hsv
    14   1488.1 MiB      0.0 MiB       import multiprocessing as mp
    15   1488.1 MiB      0.0 MiB       print("going parallel")
    16   1488.7 MiB      0.6 MiB       pool = mp.Pool()
    17   1488.7 MiB      0.0 MiB       try:
    18   1488.7 MiB      0.0 MiB           images_converted = []    # there is no memory problem when using commented lines below, instead of pool.map(…) line
    19   2932.6 MiB      0.0 MiB           for img in imgs:
    20   2932.6 MiB      2.2 MiB               images_converted.append(rgb2hsv(img))
    21                                     # images_converted = pool.map(rgb2hsv, imgs)
    22                                 except KeyboardInterrupt:
    23                                     pool.terminate()
    24   2932.6 MiB      0.0 MiB       waitat("after pool.map",5)
    25                                 
    26   2932.6 MiB      0.0 MiB       pool.close()
    27   2932.8 MiB      0.2 MiB       pool.join()
    28                                 
    29   2932.8 MiB      0.0 MiB       waitat("before del pool",5)
    30   2932.8 MiB      0.0 MiB       pool = None
    31   2932.8 MiB      0.0 MiB       del pool    # memory should now be freed here?
    32   2932.8 MiB      0.0 MiB       mp = None
    33   2932.8 MiB      0.0 MiB       rgb2hsv = None
    34                                 
    35   2932.8 MiB      0.0 MiB       waitat("after del pool",5)
    36   2932.8 MiB      0.0 MiB       print("copying over")
    37   4373.3 MiB   1440.5 MiB       res = np.array(images_converted)
    38   4373.3 MiB      0.0 MiB       waitat("before del image_hsv in function",5)
    39   2929.6 MiB      0.0 MiB       images_converted = None
    40   2929.6 MiB      0.0 MiB       del images_converted
    41   2929.6 MiB      0.0 MiB       return res


images_converted has 1512000000 bytes
before deleting original images
memory should be as before going parallel + 1512000000 bytes
nearly end, memory should be as before
end, memory should be as before
Filename: pool-mem-probs.py

Line #    Mem usage    Increment   Line Contents
================================================
    43     39.2 MiB     39.2 MiB   @profile
    44                             def doit():
    45     39.2 MiB      0.0 MiB       print("create random images")
    46     39.2 MiB      0.0 MiB       max_images = 700
    47   1481.3 MiB   1442.1 MiB       images = np.random.rand(max_images, 300, 300,3)
    48                             
    49   1481.3 MiB      0.0 MiB       waitat("before going parallel",5)
    50   2929.6 MiB   1448.3 MiB       images_converted = parallel_convert_all_to_hsv(images)
    51   2929.6 MiB      0.0 MiB       print("images_converted has %i bytes" % images_converted.nbytes)
    52                                 # how to clean up Pool's memory at latest here?
    53                             
    54   2929.6 MiB      0.0 MiB       waitat("before deleting original images",5)
    55   1487.7 MiB      0.0 MiB       images = None
    56   1487.7 MiB      0.0 MiB       del images
    57   1487.7 MiB      0.0 MiB       waitat("memory should be as before going parallel + %i bytes" % images_converted.nbytes ,10)
    58     45.7 MiB      0.0 MiB       images_converted = None
    59     45.7 MiB      0.0 MiB       del images_converted
    60     45.7 MiB      0.0 MiB       waitat("nearly end, memory should be as before" ,15)
    61     45.7 MiB      0.0 MiB       gc.collect(2)
    62     45.7 MiB      0.0 MiB       waitat("end, memory should be as before" ,15)    
like image 362
Jaleks Avatar asked Jan 07 '20 13:01

Jaleks


People also ask

How do you exit multiprocessing in Python?

Call kill() on Process The method is called on the multiprocessing. Process instance for the process that you wish to terminate.

Do Python processes share memory?

Shared memory can be a very efficient way of handling data in a program that uses concurrency. Python's mmap uses shared memory to efficiently share large amounts of data between multiple Python processes, threads, and tasks that are happening concurrently.

How lock in multiprocessing Python?

Python provides a mutual exclusion lock for use with processes via the multiprocessing. Lock class. An instance of the lock can be created and then acquired by processes before accessing a critical section, and released after the critical section. Only one process can have the lock at any time.

What does multiprocessing pool do in Python?

Using Pool. The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.

What happens if I don't set the memory pool for processes?

If you don't, then the process is reusued over and over again by the pool so the memory is never released. When set, the process will be allowed to die and a new one created in it's place.

How to clear memory in Python?

We will learn how to clear memory for a variable, list, and array using two different methods. The two different methods are del and gc.collect (). del and gc.collect () are the two different methods to delete the memory in python. The clear memory method is helpful to prevent the overflow of memory.

Why doesn’t Python release memory back to the operating system?

if you mean why doesn’t Python release memory back to the operating system; it isn’t Python. Most operating systems don’t return allocated memory back from a running program into the system available pool.

What happens if you don't clear the memory of a process?

If you don't, then the process is reusued over and over again by the pool so the memory is never released. When set, the process will be allowed to die and a new one created in it's place. That will effectively clean up the memory.


1 Answers

The generation threshold may be getting in the way, take a look at gc.get_threshold()

try including

gc.disable()
like image 96
ShpielMeister Avatar answered Nov 15 '22 20:11

ShpielMeister