Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is collections.defaultdict thread-safe?

I have not worked with threading in Python at all and asking this question as a complete stranger.

I am wondering if defaultdict is thread-safe. Let me explain it:

I have

d = defaultdict(list)

which creates a list for missing keys by default. Let's say I have multiple threads started doing this at the same time:

d['key'].append('value')

At the end, I'm supposed to end up with ['value', 'value']. However, if the defaultdict is not thread-safe, if the thread 1 yields to thread 2 after checking if 'key' in dict and before d['key'] = default_factory(), it will cause interleaving, and the other thread will create list in d['key'] and append 'value' maybe.

Then when thread 1 is executing again, it will continue from d['key'] = default_factory() which will destroy the existing list and value, and we will end up in ['key'].

I looked at CPython source code for defaultdict. However, I could not find any locks or mutexes. I guess it is not thread-safe as long as it is documented so.

Some guys last night on IRC said that there is GIL on Python, so it is conceptually thread-safe. Some said threading should not be done in Python. I'm pretty confused. Ideas?

like image 998
ahmet alp balkan Avatar asked Jul 16 '13 16:07

ahmet alp balkan


People also ask

Are Python collections thread safe?

They are thread-safe as long as you don't disable the GIL in C code for the thread.

What is collections Defaultdict?

collections. defaultdict(default_factory) returns a subclass of dict that has a default value for missing keys. The argument should be a function that returns the default value when called with no arguments. If there is nothing passed, it defaults to None .

Is Defaultdict faster than dict?

setdefault() , and the second uses a defaultdict . The time measure will depend on your current hardware, but you can see here that defaultdict is faster than dict.

How does Defaultdict work How does Defaultdict work?

A defaultdict works exactly like a normal dict, but it is initialized with a function (“default factory”) that takes no arguments and provides the default value for a nonexistent key. A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.


1 Answers

It is thread safe, in this specific case.

To know why it is important to understand when Python switches threads. CPython only allows switching between threads between Python bytecode steps. This is where the GIL comes in; every N byte code instructions the lock is released and a thread switch can take place.

The d['key'] code is handled by one bytecode (BINARY_SUBSCR) that triggers the .__getitem__() method to be called on the dictionary.

A defaultdict, configured with list as the default value factory, and using string values as keys, handles the dict.__getitem__() method entirely in C, and the GIL is never unlocked, making dict[key] lookups thread safe.

Note the qualification there; if you create a defaultdict instance with a different default-value factory, one that uses Python code (lambda: [1, 2, 3] for example), all bets are off as that means the C code calls back into Python code and the GIL can be released again while executing the bytecode for the lambda function. The same applies to the keys, when using an object that implements either __hash__ or __eq__ in Python code then a thread switch can take place there. Next, if the factory is written in C code that explicitly releases the GIL, a thread switch can take place and thread safety is out the window.

like image 99
Martijn Pieters Avatar answered Oct 14 '22 00:10

Martijn Pieters