Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Global Interpreter Lock and access to data (eg. for NumPy arrays)

I am writing a C extension for Python, which should release the Global Interpreter Lock while it operates on data. I think I have understood the mechanism of the GIL fairly well, but one question remains: Can I access data in a Python object while the thread does not own the GIL? For example, I want to read data from a (big) NumPy array in the C function while I still want to allow other threads to do other things on the other CPU cores. The C function should

  • release the GIL with Py_BEGIN_ALLOW_THREADS
  • read and work on the data without using Python functions
  • even write data to previously constructed NumPy arrays
  • reacquire the GIL with Py_END_ALLOW_THREADS

Is this safe? Of course, other threads are not supposed to change the variables which the C function uses. But maybe there is one hidden source for errors: could the Python interpreter move an object, eg. by some sort of garbage collection, while the C function works on it in a separate thread?

To illustrate the question with a minimal example, consider the (minimal but complete) code below. Compile it (on Linux) with

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -fPIC -I/usr/lib/pymodules/python2.7/numpy/core/include -I/usr/include/python2.7 -c gilexample.c -o gilexample.o
gcc -pthread -shared gilexample.o -o gilexample.so

and test it in Python with

import gilexample
gilexample.sum([1,2,3])

Is the code between Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS safe? It accesses the contents of a Python object, and I do not want to duplicate the (possibly large) array in memory.

#include <Python.h>
#include <numpy/arrayobject.h>

// The relevant function
static PyObject * sum(PyObject * const self, PyObject * const args) {
  PyObject * X;
  PyArg_ParseTuple(args, "O", &X);
  PyObject const * const X_double = PyArray_FROM_OTF(X, NPY_DOUBLE, NPY_ALIGNED);
  npy_intp const size = PyArray_SIZE(X_double);
  double * const data = (double *) PyArray_DATA(X_double);
  double sum = 0;

  Py_BEGIN_ALLOW_THREADS // IS THIS SAFE?

  npy_intp i;
  for (i=0; i<size; i++)
    sum += data[i];

  Py_END_ALLOW_THREADS

  Py_DECREF(X_double);
  return PyFloat_FromDouble(sum);
}

// Python interface code
// List the C methods that this extension provides.
static PyMethodDef gilexampleMethods[] = {
  {"sum", sum, METH_VARARGS},
  {NULL, NULL, 0, NULL}     /* Sentinel - marks the end of this structure */
};

// Tell Python about these methods.
PyMODINIT_FUNC initgilexample(void)  {
  (void) Py_InitModule("gilexample", gilexampleMethods);
  import_array();  // Must be present for NumPy.
}
like image 460
Daniel Avatar asked Jan 11 '12 18:01

Daniel


People also ask

What is the global interpreter lock in Python?

The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter. This means that only one thread can be in a state of execution at any point in time.

What is the use of global interpreter lock?

A global interpreter lock (GIL) is a mechanism used in computer-language interpreters to synchronize the execution of threads so that only one native thread (per process) can execute at a time. An interpreter that uses GIL always allows exactly one thread to execute at a time, even if run on a multi-core processor.

Does numpy release GIL?

According to this scipy cookbook, if you are using numpy to do array operations then Python will release the GIL, meaning that if you write your code in a numpy style, much of the calculations will be done in a few array operations, providing you with a speedup by using multiple threads.

What is Ruby global interpreter lock?

The Global Interpreter Lock is a mechanism used in computer language interpreters to synchronize the execution of threads so that only one thread can execute at a time. An interpreter which uses GIL will always allow exactly one thread and one thread only to execute at a time, even if run on a multi-core processor.


1 Answers

Is this safe?

Strictly, no. I think you should move the calls to PyArray_SIZE and PyArray_DATA outside the GIL-less block; if you do that, you'll be operating on C data only. You might also want to increment the reference count on the object before going into the GIL-less block and decrement it afterwards.

After your edits, it should be safe. Don't forget to decrement the reference count afterwards.

like image 120
Fred Foo Avatar answered Sep 18 '22 23:09

Fred Foo