I am writing a C extension for Python, which should release the Global Interpreter Lock while it operates on data. I think I have understood the mechanism of the GIL fairly well, but one question remains: Can I access data in a Python object while the thread does not own the GIL? For example, I want to read data from a (big) NumPy array in the C function while I still want to allow other threads to do other things on the other CPU cores. The C function should
Py_BEGIN_ALLOW_THREADS
Py_END_ALLOW_THREADS
Is this safe? Of course, other threads are not supposed to change the variables which the C function uses. But maybe there is one hidden source for errors: could the Python interpreter move an object, eg. by some sort of garbage collection, while the C function works on it in a separate thread?
To illustrate the question with a minimal example, consider the (minimal but complete) code below. Compile it (on Linux) with
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -fPIC -I/usr/lib/pymodules/python2.7/numpy/core/include -I/usr/include/python2.7 -c gilexample.c -o gilexample.o
gcc -pthread -shared gilexample.o -o gilexample.so
and test it in Python with
import gilexample
gilexample.sum([1,2,3])
Is the code between Py_BEGIN_ALLOW_THREADS
and Py_END_ALLOW_THREADS
safe? It accesses the contents of a Python object, and I do not want to duplicate the (possibly large) array in memory.
#include <Python.h>
#include <numpy/arrayobject.h>
// The relevant function
static PyObject * sum(PyObject * const self, PyObject * const args) {
PyObject * X;
PyArg_ParseTuple(args, "O", &X);
PyObject const * const X_double = PyArray_FROM_OTF(X, NPY_DOUBLE, NPY_ALIGNED);
npy_intp const size = PyArray_SIZE(X_double);
double * const data = (double *) PyArray_DATA(X_double);
double sum = 0;
Py_BEGIN_ALLOW_THREADS // IS THIS SAFE?
npy_intp i;
for (i=0; i<size; i++)
sum += data[i];
Py_END_ALLOW_THREADS
Py_DECREF(X_double);
return PyFloat_FromDouble(sum);
}
// Python interface code
// List the C methods that this extension provides.
static PyMethodDef gilexampleMethods[] = {
{"sum", sum, METH_VARARGS},
{NULL, NULL, 0, NULL} /* Sentinel - marks the end of this structure */
};
// Tell Python about these methods.
PyMODINIT_FUNC initgilexample(void) {
(void) Py_InitModule("gilexample", gilexampleMethods);
import_array(); // Must be present for NumPy.
}
The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter. This means that only one thread can be in a state of execution at any point in time.
A global interpreter lock (GIL) is a mechanism used in computer-language interpreters to synchronize the execution of threads so that only one native thread (per process) can execute at a time. An interpreter that uses GIL always allows exactly one thread to execute at a time, even if run on a multi-core processor.
According to this scipy cookbook, if you are using numpy to do array operations then Python will release the GIL, meaning that if you write your code in a numpy style, much of the calculations will be done in a few array operations, providing you with a speedup by using multiple threads.
The Global Interpreter Lock is a mechanism used in computer language interpreters to synchronize the execution of threads so that only one thread can execute at a time. An interpreter which uses GIL will always allow exactly one thread and one thread only to execute at a time, even if run on a multi-core processor.
Is this safe?
Strictly, no. I think you should move the calls to PyArray_SIZE
and PyArray_DATA
outside the GIL-less block; if you do that, you'll be operating on C data only. You might also want to increment the reference count on the object before going into the GIL-less block and decrement it afterwards.
After your edits, it should be safe. Don't forget to decrement the reference count afterwards.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With