Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why using multiple threading to get the sum is correct?

Tags:

python

my code is

import threading

counter = 0

def worker():
    global counter
    counter += 1

if __name__ == "__main__":
    threads = []
    for i in range(1000):
        t = threading.Thread(target = worker)
        threads.append(t)
        t.start()
    for t in threads:
        t.join()

    print counter

because I don't use lock to protect the shared resource,i.e. counter variable, I expect that the result is a number less than 1000, but the counter is always 1000, I don't know why. Does counter += 1 is a atomic operation in Python?

what operations in Python are atomic using GIL?

like image 555
remykits Avatar asked Feb 25 '13 01:02

remykits


People also ask

Why do we need multiple threads?

Multithreading allows the execution of multiple parts of a program at the same time. These parts are known as threads and are lightweight processes available within the process. So multithreading leads to maximum utilization of the CPU by multitasking.

Can multiple threads read the same data?

Multiple threads can also read data from the same FITS file simultaneously, as long as the file was opened independently by each thread. This relies on the operating system to correctly deal with reading the same file by multiple processes.

Can you have multiple threads?

Multithreading is a model of program execution that allows for multiple threads to be created within a process, executing independently but concurrently sharing process resources. Depending on the hardware, threads can run fully parallel if they are distributed to their own CPU core.


1 Answers

Don't count on x += 1 being thread-safe. Here is an example where it does not work (see Josiah Carlson's comment):

import threading
x = 0
def foo():
    global x
    for i in xrange(1000000):
        x += 1
threads = [threading.Thread(target=foo), threading.Thread(target=foo)]
for t in threads:
    t.daemon = True
    t.start()
for t in threads:
    t.join()
print(x)

If you disassemble foo:

In [80]: import dis

In [81]: dis.dis(foo)
  4           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_CONST               1 (1000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  5          19 LOAD_GLOBAL              1 (x)
             22 LOAD_CONST               2 (1)
             25 INPLACE_ADD         
             26 STORE_GLOBAL             1 (x)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           
        >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE        

You see that there is a LOAD_GLOBAL to retrieve the value of x, there is an INPLACE_ADD, and then a STORE_GLOBAL.

If both threads LOAD_GLOBAL in succession, then they might both load the same value of x. Then they both increment to the same number, and store the same number. So the work of one thread overwrites the work of the other. This is not thread-safe.

As you can see, the final value of x would be 2000000 if the program were thread-safe, but instead you almost always get a number less than 2000000.


If you add a lock, you get the "expected" answer:

import threading
lock = threading.Lock()
x = 0
def foo():
    global x
    for i in xrange(1000000):
        with lock:
            x += 1
threads = [threading.Thread(target=foo), threading.Thread(target=foo)]
for t in threads:
    t.daemon = True
    t.start()
for t in threads:
    t.join()
print(x)

yields

2000000

I think the reason why the code you posted does not exhibit a problem:

for i in range(1000):
    t = threading.Thread(target = worker)
    threads.append(t)
    t.start()

is because your workers complete so darn quickly compared to the time it takes to spawn a new thread that in practice there is no competition between threads. In Josiah Carlson's example above, each thread spends a significant amount of time in foo which increases the chance of thread collision.

like image 124
unutbu Avatar answered Oct 17 '22 22:10

unutbu