Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded Access to Python bytearray

It seems that since access to NumPy array data doesn't require calls into the Python interpreter, C extensions can manipulate these arrays after releasing the GIL. For instance, in this thread.

The built-in Python type bytearray supports the Buffer Protocol, one member of which is

void *buf

A pointer to the start of the logical structure described by the buffer fields. [...] For contiguous arrays, the value points to the beginning of the memory block.

My question is, can a C extension manipulate this buf after releasing the GIL (Py_BEGIN_ALLOW_THREADS) since accessing it no longer requires calls to the Python C API? Or does the nature of the Python garbage collector forbid this, since the bytearray, and its buf, might be moved during execution?

like image 814
jpavel Avatar asked Oct 22 '22 03:10

jpavel


1 Answers

To clarify the short answer written as comment: you can access the *buf data without holding the GIL, provided you are sure that the Py_buffer struct is "owned" by the thread while it is running without the GIL.

For the sake of completeness, I should add that this may open the door to (very remote) crashes risks: if the GIL-less thread reads the data at *buf while at the same time another GIL-holding thread is running Python code that changes the same data (bytearray[index]=x) then the GIL-less thread can see unexpected changes of the data under its feet. The opposite is true too, and even more annoying (but still theoretical): if the GIL-less thread changes the data at *buf, then other GIL-holding, Python-running threads might see strange results or even maybe crashes if doing some complex reading operations like bytearray.split().

like image 182
Armin Rigo Avatar answered Oct 26 '22 23:10

Armin Rigo