Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is global counter thread safe in python?

Tags:

python

import threading
import time


counter = 0

def increase(name):
    global counter
    i = 0
    while i < 30:
        # this for loop is for consuming cpu
        for x in xrange(100000):
            1+1
        counter += 1
        print name + " " + str(counter)
        i += 1


if __name__ == '__main__':
    threads = []
    try:
        for i in xrange(100):
           name = "Thread-" + str(i)
           t = threading.Thread( target=increase, args=(name,) )
           t.start()
           threads.append(t)
    except:
          print "Error: unable to start thread"

    for t in threads:
        t.join()

Python version is 2.7.5.

For the above code, I run it several times, the final result is always 3000.

And this code is also the example of this blog. http://effbot.org/zone/thread-synchronization.htm

But this blog also mentions that:

In general, this approach only works if the shared resource consists of a single instance of a core data type, such as a string variable, a number, or a list or dictionary. Here are some thread-safe operations:

  • reading or replacing a single instance attribute
  • reading or replacing a single global variable
  • fetching an item from a list
  • modifying a list in place (e.g. adding an item using append)
  • fetching an item from a dictionary
  • modifying a dictionary in place (e.g. adding an item, or calling the clear method)

This confuses me that, do we really need lock to get the correct result with multi-threading in python?

Update 1

My Linux distro is CentOS Linux release 7.2.1511, kernel version is 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux.

And my mac is version 10.11.5 (15F34), python version is 2.7.10.

I run the program on my Mac, the result is expected, the counter does not equal to the expected, due to the usage of un-thread-safe global counter.

But when I ran the program on my Linux, the result always equals to the expected value.

counter:3000, expected:3000
counter:3000, expected:3000
counter:3000, expected:3000
counter:3000, expected:3000
counter:3000, expected:3000

Do I miss something here which can cause the difference?

Update 2

Another observation is that the linux box i use above has only one core. When I switch to another linux box which has 4 cores, the result is expected.

According to my understanding of Python GIL, it guarantees that the programs will alway run on a single core, no matter how many cores the platform has. But the GIL will not guarantee the safety between different threads right?

If this holds, why the single core machine gives such result?

Thanks.

like image 781
Alex Avatar asked Jun 23 '16 11:06

Alex


1 Answers

It's not safe, even in CPython. Although the GIL protects a single opcode execution, a += is actually expanded to several instructions:

Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> counter = 0
>>> def inc():
...     global counter
...     counter += 1
... 
>>> dis.dis(inc)
  3           0 LOAD_GLOBAL              0 (counter)
              3 LOAD_CONST               1 (1)
              6 INPLACE_ADD         
              7 STORE_GLOBAL             0 (counter)
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

The code here loads counter on the stack, increments it and stores it back; thus, there is a race condition between LOAD_GLOBAL and the STORE_GLOBAL. Let's imagine that two threads running inc get preempted as follows:

Thread 1                Thread 2
LOAD_GLOBAL 0
LOAD_CONST 1
INPLACE_ADD
                        LOAD_GLOBAL 0
                        LOAD_CONST 1
                        INPLACE_ADD
                        STORE_GLOBAL 0
STORE_GLOBAL 0
LOAD_CONST 0
RETURN_VALUE
                        LOAD_CONST 0
                        RETURN_VALUE

Here the increment done by thread 2 gets lost completely, since thread 1 overwrites counter with his incremented stale value.

You can easily verify this yourself removing much of the waste of times in your code and make them "race hard":

import threading
import time

counter = 0
loops_per_increment = 10000

def increment(name):
    global counter
    i = 0
    while i < loops_per_increment:
        counter += 1
        i += 1


if __name__ == '__main__':
    expected = 0
    threads = []
    try:
        for i in xrange(100):
           name = "Thread-" + str(i)
           t = threading.Thread( target=increment, args=(name,) )
           expected += loops_per_increment
           t.start()
           threads.append(t)
    except:
          print "Error: unable to start thread"

    for t in threads:
        t.join()
    print counter, "- expected:", expected

Here's some numbers I get on my 8-core machine:

[mitalia@mitalia ~/scratch]$ for i in (seq 10)
                                 python inc.py 
                             end
47012 - expected: 1000000
65696 - expected: 1000000
51456 - expected: 1000000
44628 - expected: 1000000
52087 - expected: 1000000
50812 - expected: 1000000
53277 - expected: 1000000
49652 - expected: 1000000
73703 - expected: 1000000
53902 - expected: 1000000
like image 169
Matteo Italia Avatar answered Oct 16 '22 09:10

Matteo Italia