Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modifying a Python dictionary from different threads

When it comes to threading, I know you have to make sure you're not editing a variable at the same time another thread is editing it, as your changes can be lost (when incrementing a counter, for example)

Does the same apply to dictionaries? Or is a dictionary a collection of variables?

If every thread were to lock the dictionary it would slow the program down significantly, while every thread only needs write access to its own little piece of the dictionary.

If it isn't possible, is there some sort of variable variable in python, like in php?

like image 655
skerit Avatar asked Dec 27 '10 22:12

skerit


People also ask

Can Python dictionary be modified?

Modifying a value in a dictionary is pretty similar to modifying an element in a list. You give the name of the dictionary and then the key in square brackets, and set that equal to the new value.

How do you share data between threads in Python?

You can protect data variables shared between threads using a threading. Lock mutex lock, and you can share data between threads explicitly using queue.

Are Dicts Threadsafe Python?

Most Dictionary Operations Are Atomic Many common operations on a dict are atomic, meaning that they are thread-safe.


1 Answers

Does the same apply to dictionaries? Or is a dictionary a collection of variables?

Let's be more general:

What does "atomic operation" mean?

From Wikipedia :

In concurrent programming, an operation (or set of operations) is atomic, linearizable, indivisible or uninterruptible if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes.

Now what does this mean in Python?

This means that each bytecode instruction is atomic (at least for Python <3.2, before the new GIL).

Why is that???

Because Python (CPython) use a Global Interpreter Lock (GIL). The CPython interpreter uses a lock to make sure that only one thread runs in the interpreter at a time, and uses a "check interval" (see sys.getcheckinterval()) to know how many bytecode instructions to execute before switching between threads (by default set to 100).

So now what does this mean??

It means that operations that can be represented by only one bytecode instruction are atomic. For example, incrementing a variable is not atomic, because the operation is done in three bytecode instructions:

>>> import dis

>>> def f(a):
        a += 1

>>> dis.dis(f)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_CONST               1 (1)      <<<<<<<<<<<< Operation 1 Load
              6 INPLACE_ADD                         <<<<<<<<<<<< Operation 2 iadd
              7 STORE_FAST               0 (a)      <<<<<<<<<<<< Operation 3 store
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

So what about dictionaries??

Some operations are atomic; for example, this operation is atomic:

d[x] = y
d.update(d2)
d.keys()

See for yourself:

>>> def f(d):
        x = 1
        y = 1
        d[x] = y

>>> dis.dis(f)
  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               1 (x)

  3           6 LOAD_CONST               1 (1)
              9 STORE_FAST               2 (y)

  4          12 LOAD_FAST                2 (y)
             15 LOAD_FAST                0 (d)
             18 LOAD_FAST                1 (x)
             21 STORE_SUBSCR                      <<<<<<<<<<< One operation 
             22 LOAD_CONST               0 (None)
             25 RETURN_VALUE   

See this to understand what STORE_SUBSCR does.

But as you see, it is not totally true, because this operation:

             ...
  4          12 LOAD_FAST                2 (y)
             15 LOAD_FAST                0 (d)
             18 LOAD_FAST                1 (x)
             ...

can make the entire operation not atomic. Why? Let's say the variable x can also be changed by another thread...or that you want another thread to clear your dictionary...we can name many cases when it can go wrong, so it is complicated! And so here we will apply Murphy's Law: "Anything that can go wrong, will go wrong".

So what now?

If you still want to share variables between thread, use a lock:

import threading

mylock = threading.RLock()

def atomic_operation():
    with mylock:
        print "operation are now atomic"
like image 64
mouad Avatar answered Sep 19 '22 17:09

mouad