I believe it is a stupid question but I still can't find it. Actually it's better to separate it into two questions:
1) Am I right that we could have a lot of threads but because of GIL in one moment only one thread is executing?
2) If so, why do we still need locks? We use locks to avoid the case when two threads are trying to read/write some shared object, because of GIL twi threads can't be executed in one moment, can they?
A lock allows you to force multiple threads to access a resource one at a time, rather than all of them trying to access the resource simultaneously.
To prevent inconsistent changes, these C extensions required a thread-safe memory management which the GIL provided. The GIL is simple to implement and was easily added to Python. It provides a performance increase to single-threaded programs as only one lock needs to be managed.
The GIL provides an important simplifying model of object access (including refcount manipulation) because it ensures that only one thread of execution can mutate Python objects at a time5. There are important performance benefits of the GIL for single-threaded operations as well.
The LOCK THREAD statement ensures that no other thread executes. This condition remains in effect until the thread is unlocked (with UNLOCK THREAD) or the thread terminates.
GIL protects the Python interals. That means:
But GIL does not protect your own code. For example, if you have this code:
self.some_number += 1
That is going to read value of self.some_number
, calculate some_number+1
and then write it back to self.some_number
.
If you do that in two threads, the operations (read, add, write) of one thread and the other may be mixed, so that the result is wrong.
This could be the order of execution:
self.some_number
(0)self.some_number
(0)some_number+1
(1)some_number+1
(1)self.some_number
self.some_number
You use locks to enforce this order of execution:
self.some_number
(0)some_number+1
(1)self.some_number
self.some_number
(1)some_number+1
(2)self.some_number
import threading import time total = 0 lock = threading.Lock() def increment_n_times(n): global total for i in range(n): total += 1 def safe_increment_n_times(n): global total for i in range(n): lock.acquire() total += 1 lock.release() def increment_in_x_threads(x, func, n): threads = [threading.Thread(target=func, args=(n,)) for i in range(x)] global total total = 0 begin = time.time() for thread in threads: thread.start() for thread in threads: thread.join() print('finished in {}s.\ntotal: {}\nexpected: {}\ndifference: {} ({} %)' .format(time.time()-begin, total, n*x, n*x-total, 100-total/n/x*100))
There are two functions which implement increment. One uses locks and the other does not.
Function increment_in_x_threads
implements parallel execution of the incrementing function in many threads.
Now running this with a big enough number of threads makes it almost certain that an error will occur:
print('unsafe:') increment_in_x_threads(70, increment_n_times, 100000) print('\nwith locks:') increment_in_x_threads(70, safe_increment_n_times, 100000)
In my case, it printed:
unsafe: finished in 0.9840562343597412s. total: 4654584 expected: 7000000 difference: 2345416 (33.505942857142855 %) with locks: finished in 20.564176082611084s. total: 7000000 expected: 7000000 difference: 0 (0.0 %)
So without locks, there were many errors (33% of increments failed). On the other hand, with locks it was 20 times slower.
Of course, both numbers are blown up because I used 70 threads, but this shows the general idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With