Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3.8 shared_memory resource_tracker producing unexpected warnings at application close

  • I am using a multiprocessing.Pool which calls a function in 1 or more subprocesses to produce a large chunk of data.
  • The worker process creates a multiprocessing.shared_memory.SharedMemory object and uses the default name assigned by shared_memory.
  • The worker returns the string name of the SharedMemory object to the main process.
  • In the main process the SharedMemory object is linked to, consumed, and then unlinked & closed.

At shutdown I'm seeing warnings from resource_tracker:

/usr/local/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 10 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/local/lib/python3.8/multiprocessing/resource_tracker.py:229: UserWarning: resource_tracker: '/psm_e27e5f9e': [Errno 2] No such file or directory: '/psm_e27e5f9e'
  warnings.warn('resource_tracker: %r: %s' % (name, e))
/usr/local/lib/python3.8/multiprocessing/resource_tracker.py:229: UserWarning: resource_tracker: '/psm_2cf099ac': [Errno 2] No such file or directory: '/psm_2cf099ac'
<8 more similar messages omitted>

Since I unlinked the shared memory objects in my main process I'm confused about what's happening here. I suspect these messages are occurring in the subprocess (in this example I tested with a process pool of size 1).

Here is a minimum reproducible example:

import multiprocessing
import multiprocessing.shared_memory as shared_memory

def create_shm():
    shm = shared_memory.SharedMemory(create=True, size=30000000)
    shm.close()
    return shm.name

def main():
    pool = multiprocessing.Pool(processes=4)
    tasks = [pool.apply_async(create_shm) for _ in range(200)]

    for task in tasks:
        name = task.get()
        print('Getting {}'.format(name))
        shm = shared_memory.SharedMemory(name=name, create=False)
        shm.close()
        shm.unlink()

    pool.terminate()
    pool.join()

if __name__ == '__main__':
    main()

I have found that running that example on my own laptop (Linux Mint 19.3) it runs fine, however running it on two different server machines (unknown OS configurations, but both different) it does exhibit the problem. In all cases I'm running the code from a docker container, so Python/software config is identical, the only difference is the Linux kernel/host OS.

I notice this documentation that might be relevant: https://docs.python.org/3.8/library/multiprocessing.html#contexts-and-start-methods

I also notice that the number of "leaked shared_memory objects" varies from run to run. Since I unlink in main process, then immediately exit, perhaps this resource_tracker (which I think is a separate process) has just not received an update before the main process exits. I don't understand the role of the resource_tracker well enough to fully understand what I just proposed though.

Related topics:

  • https://bugs.python.org/issue39959
like image 341
David Parks Avatar asked Jul 06 '20 03:07

David Parks


1 Answers

In theory and based on the current implementation of SharedMemory, the warnings should be expected. The main reason is that every shared memory object you have created is being tracked twice: first, when it's produced by one of the processes in the Pool object; and second, when it's consumed by the main process. This is mainly because the current implementation of the constructor of SharedMemory will register the shared memory object regardless of whether the createargument is set to True or its value is False.

So, when you call shm.unlink() in the main process, what you are doing is deleting the shared memory object entirely before its producer (some process in the Pool) gets around to cleaning it up. As a result, when the pool gets destroyed, each of its members (if they ever got a task) has to clean up after itself. The first warning about leaked resources probably refers to the shared memory objects actually created by processes in the Pool that never got unlinked by those same processes. And, the No such file or directory warnings are due to the fact that the main process has unlinked the files associated with the shared memory objects before the processes in the Pool are destroyed.

The solution provided in the linked bug report would likely prevent consuming processes from having to spawn additional resource trackers, but it does not quite prevent the issue that arises when a consuming process decides to delete a shared memory object that it did not create. This is because the process that produced the shared memory object will still have to do some clean up, i.e. some unlinking, before it exits or is destroyed.

The fact that you are not seeing those warnings is quite puzzling. But it may well have to do with a combination of OS scheduling, unflushed buffers in the child process and the start method used when creating a process pool.

For comparison, when I use fork as a start method on my machine, I get the warnings. Otherwise, I see no warnings when spawn and forkserver are used. I added argument parsing to your code to make it easy to test different start methods:

#!/usr/bin/env python3
# shm_test_script.py
"""
Use --start_method or -s to pick a process start method when creating a process Pool.
Use --tasks or -t to control how many shared memory objects should be created.
Use --pool_size or -p to control the number of child processes in the create pool.
"""
import argparse
import multiprocessing
import multiprocessing.shared_memory as shared_memory


def create_shm():
    shm = shared_memory.SharedMemory(create=True, size=30000000)
    shm.close()
    return shm.name


def main(tasks, start_method, pool_size):
    multiprocessing.set_start_method(start_method, force=True)
    pool = multiprocessing.Pool(processes=pool_size)
    tasks = [pool.apply_async(create_shm) for _ in range(tasks)]

    for task in tasks:
        name = task.get()
        print('Getting {}'.format(name))
        shm = shared_memory.SharedMemory(name=name, create=False)
        shm.close()
        shm.unlink()
    pool.terminate()
    pool.join()


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument(
        '--start_method', '-s',
        help='The multiproccessing start method to use. Default: %(default)s',
        default=multiprocessing.get_start_method(),
        choices=multiprocessing.get_all_start_methods()
    )
    parser.add_argument(
        '--pool_size', '-p',
        help='The number of processes in the pool. Default: %(default)s',
        type=int,
        default=multiprocessing.cpu_count()
    )
    parser.add_argument(
        '--tasks', '-t',
        help='Number of shared memory objects to create. Default: %(default)s',
        default=200,
        type=int
    )
    args = parser.parse_args()
    main(args.tasks, args.start_method, args.pool_size)

Given that fork is the only method that ends up displaying the warnings (for me, at least), maybe there is actually something to the following statement about it:

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

It's not surprising that the warnings from child processes persist/propagate if all resources of the parent are inherited by the child processes.

If you're feeling particularly adventurous, you can edit the multiprocessing/resource_tracker.py and update warnings.warn lines by adding os.getpid() to the printed strings. For instance, changing any warning with "resource_tracker:" to "resource_tracker %d: " % (os.getpid()) should be sufficient. If you've done this, you will notice that the warnings come from various processes that are neither the child processes, nor the main process itself.

With those changes made, the following should help with double checking that the complaining resource trackers are as many as your Pool size, and their process IDs are different from the main process or the child processes:

chmod +x shm_test_script.py
./shm_test_script.py -p 10 -t 50 -s fork > log 2> err
awk -F ':' 'length($4) > 1 { print $4 }' err | sort | uniq -c

That should display ten lines, each of which prepended with the number of complaints from the corresponding resource tracker. Every line should also contain a PID that should be different from the main and child processes.

To recap, each child process should have its own resource tracker if it receives any work. Since you're not explicitly unlinking the shared memory objects in the child processes, the resources will likely get cleaned up when the child processes are destroyed.

I hope this helps answer some, if not all, of your questions.

like image 95
Abdou Avatar answered Nov 10 '22 14:11

Abdou