Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding shared_memory in Python 3.8

I'm trying to understand some of shared_memory's operation.

Looking at the source , it looks like the module uses shm_open() for UNIX environments, and CreateFileMapping \ OpenFileMapping on windows, combined with mmap.

I understand from here, that in order to avoid a thorough serialization / deserialization by pickle, one needs to implement __setstate__() and __getstate__() explicitly for his shared datatype.

I do not see any such implementation in shared_memory.py.

How does shared_memory circumvent the pickle treatment?

Also, on a Windows machine, this alone seems to survive accross interpreters:

from mmap import mmap

shared_size = 12
shared_label = "my_mem"

mmap(-1, shared_size , shared_label)

Why then is CreateFileMapping \ OpenFileMapping needed here?

like image 474
Jay Avatar asked Jul 03 '19 20:07

Jay


People also ask

How to share memory between processes python?

Python do NOT support shared memory between independent processes. You can implement your own in C language, or use SharedArray if you are working with libsvm, numpy. ndarray, scipy. sparse.

What is shared memory python?

Shared memory can be a very efficient way of handling data in a program that uses concurrency. Python's mmap uses shared memory to efficiently share large amounts of data between multiple Python processes, threads, and tasks that are happening concurrently.

What is a Daemonic process Python?

Daemon processes in Python Python multiprocessing module allows us to have daemon processes through its daemonic option. Daemon processes or the processes that are running in the background follow similar concept as the daemon threads. To execute the process in the background, we need to set the daemonic flag to true.

Does Python multiprocessing use shared memory?

Python 3.8 introduced a new module multiprocessing. shared_memory that provides shared memory for direct access across processes. My test shows that it significantly reduces the memory usage, which also speeds up the program by reducing the costs of copying and moving things around.


1 Answers

How does shared_memory circumvent the pickle treatment?

I think you are confusing shared ctypes and shared objects between processes.

First, you don't have to use the sharing mechanisms provided by multiprocessing in order to get shared objects, you can just wrap basic primitives such as mmap / Windows-equivalent or get fancier using any API that your OS/kernel provides you.

Next, the second link you mention regarding how copy is done and how __getstate__ defines the behavior of the pickling is dependent on you — using the sharedctypes module API. You are not forced to perform pickling to share memory between two processes.

In fact, sharedctypes is backed by anonymous shared memory which uses: https://github.com/python/cpython/blob/master/Lib/multiprocessing/heap.py#L31

Both implementations relies on an mmap-like primitive.

Anyway, if you try to copy something using sharedctype, you will hit:

  • https://github.com/python/cpython/blob/master/Lib/multiprocessing/sharedctypes.py#L98
  • https://github.com/python/cpython/blob/master/Lib/multiprocessing/sharedctypes.py#L39
  • https://github.com/python/cpython/blob/master/Lib/multiprocessing/sharedctypes.py#L135

And this function is using ForkingPickler which will make use of pickle and then… ultimately, you'll call __getstate__ somewhere.

But it's not relevant with shared_memory, because shared_memory is not really a ctype-like object.

You have other ways to share objects between processes, using the Resource Sharer / Tracker API: https://github.com/python/cpython/blob/master/Lib/multiprocessing/resource_sharer.py which will rely on pickle serialization/deserialization.

But you don't share shared memory through shared memory, right?

When you use: https://github.com/python/cpython/blob/master/Lib/multiprocessing/shared_memory.py

You create a block of memory with a unique name, and all processes must have the unique name before sharing the memory, otherwise you will not be able to attach it.

Basically, the analogy is:

You have a group of friends and you all have a unique secret base that only you have the location, you will go on errands, be away from each other, but you can all meet at this unique location.

In order for this to work, you must all know the location before going away from each other. If you do not have it beforehand, you are not certain that you will be able to figure out the place to meet them.

That is the same with the shared_memory, you only need its name to open it. You don't share / transfer shared_memory between processes. You read into shared_memory using its unique name from multiple processes.

As a result, why would you pickle it? You can. You can absolutely pickle it. But that might not be built-in, because it's straightforward to just send the unique name to all your processes through another shared memory channel or anything like that.

There is no circumvention required here. ShareableList is just an example of application of SharedMemory class. As you can see it here: https://github.com/python/cpython/blob/master/Lib/multiprocessing/shared_memory.py#L314

It requires something akin to a unique name, you can use anonymous shared memory also and transmit its name later through another channel (write a temporary file, send it back to some API, whatever).

Why then is CreateFileMapping \ OpenFileMapping needed here?

Because it depends on your Python interpreter, here you are might be using CPython, which is doing the following:

https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1440

It's already using CreateFileMapping indirectly so that doing CreateFileMapping then attaching it is just duplicating the already-done work in CPython.

But, what about others interpreters? Do all interpreters perform the necessary to make mmap work on non-POSIX platforms? Maybe the rationale of the developer would be this.

Anyway, it is not surprising that mmap would work out of the box.

like image 54
Raito Avatar answered Sep 22 '22 22:09

Raito