Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a dictionary in Cython , especially inside nogil

Tags:

python

cython

gil

I am having a dictionary,

my_dict = {'a':[1,2,3], 'b':[4,5] , 'c':[7,1,2])

I want to use this dictionary inside a Cython nogil function . So , I tried to declare it as

cdef dict cy_dict = my_dict 

Up to this stage is fine.

Now I need to iterate over the keys of my_dict and if the values are in list, iterate over it. In Python , it is quite easy as follows:

 for key in my_dict:
      if isinstance(my_dict[key], (list, tuple)):
          ###### Iterate over the value of the list or tuple
          for value in list:
               ## Do some over operation.

But, inside Cython, I want to implement the same that too inside nogil . As, python objects are not allowed inside nogil, I am all stuck up here.

with nogil:
    #### same implementation of the same in Cython

Can anyone please help me out ?

like image 685
Seja Nair Avatar asked Aug 28 '15 08:08

Seja Nair


People also ask

What does Nogil do Cython?

The nogil keyword tells Cython that a particular function or code section should be executed without the GIL. When the GIL is released, it is not possible to make any Python API calls, meaning that only C variables and C functions (declared with cdef ) can be used.

Does Cython has GIL?

Cython allows you to release the GIL. That means that you can do multi-threading in at least 2 ways: Directly in Cython, using OpenMP with prange. Using e.g. joblib with a multi-threading backend (the parts of your code that will be parallelized are the parts that release the GIL)


1 Answers

You can't use Python dict without the GIL because everything you could do with it involves manipulating Python objects. You most sensible option is to accept that you need the GIL. There's a less sensible option too involving C++ maps, but it may be hard to apply for your specific case.

You can use with gil: to reacquire the GIL. There is obvious an overhead here (parts using the GIL can't be executed in parallel, and there may be a delay which it waits for the GIL). However, if the dictionary manipulation is a small chunk of a larger piece of Cython code this may not be too bad:

with nogil:
  # some large chunk of computationally intensive code goes here
  with gil:
    # your dictionary code
  # more computationally intensive stuff here

The other less sensible option is to use C++ maps (along side other C++ standard library data types). Cython can wrap these and automatically convert them. To give a trivial example based on your example data:

from libcpp.map cimport map
from libcpp.string cimport string
from libcpp.vector cimport vector
from cython.operator cimport dereference, preincrement

def f():
    my_dict = {'a':[1,2,3], 'b':[4,5] , 'c':[7,1,2]}
    # the following conversion has an computational cost to it 
    # and must be done with the GIL. Depending on your design
    # you might be able to ensure it's only done once so that the
    # cost doesn't matter much
    cdef map[string,vector[int]] m = my_dict
    
    # cdef statements can't go inside no gil, but much of the work can
    cdef map[string,vector[int]].iterator end = m.end()
    cdef map[string,vector[int]].iterator it = m.begin()
        
    cdef int total_length = 0
    
    with nogil: # all  this stuff can now go inside nogil   
        while it != end:
            total_length += dereference(it).second.size()
            preincrement(it)
        
    print total_length

(you need to compile this with language='c++').

The obvious disadvantage to this is that the data-types inside the dict must be known in advance (it can't be an arbitrary Python object). However, since you can't manipulate arbitrary Python objects inside a nogil block you're pretty restricted anyway.

6-year later addendum: I don't recommend the "use C++ objects everywhere" approach as a general approach. The Cython-C++ interface is a bit clunky and you can spend a lot of time working around it. The Python containers are actually better than you think. Everyone tends to forget about the cost of converting their C++ objects to/from Python objects. People rarely consider if they really need to release the GIL or if they just read an article on the internet somewhere saying that the GIL is bad..

It's good for some tasks, but think carefully before blindly replacing all your list with vector, dict with map etc.. As a rule, if your C++ types live entirely within your function it may be a good move (but think twice...). If they're being converted as input or output arguments then think a third time.

like image 97
DavidW Avatar answered Sep 28 '22 03:09

DavidW