I believe it is a stupid question but I still can't find it. Actually it's better to separate it into two questions: 1) Am I right that we could have a lot of threads but because of GIL in one moment only one thread is executing? 2) If so, why do we still need locks? We use locks to avoid the case when two threads are trying to read/write some shared object, because of GIL twi threads can't be executed in one moment, can they?

GIL protects the Python interals. That means: <ol> <li>you don't have to worry about something in the interpreter going wrong because of multithreading</li> <li>most things do not really run in parallel, because python code is executed sequentially due to GIL</li> </ol> But GIL does not protect your own code. For example, if you have this code: <pre class="prettyprint"><code>self.some_number += 1 </code></pre> That is going to read value of <code>self.some_number</code>, calculate <code>some_number+1</code> and then write it back to <code>self.some_number</code>. If you do that in two threads, the operations (read, add, write) of one thread and the other may be mixed, so that the result is wrong. This could be the order of execution: <ol> <li>thread1 reads <code>self.some_number</code> (0)</li> <li>thread2 reads <code>self.some_number</code> (0)</li> <li>thread1 calculates <code>some_number+1</code> (1)</li> <li>thread2 calculates <code>some_number+1</code> (1)</li> <li>thread1 writes 1 to <code>self.some_number</code> </li> <li>thread2 writes 1 to <code>self.some_number</code> </li> </ol> You use locks to enforce this order of execution: <ol> <li>thread1 reads <code>self.some_number</code> (0)</li> <li>thread1 calculates <code>some_number+1</code> (1)</li> <li>thread1 writes 1 to <code>self.some_number</code> </li> <li>thread2 reads <code>self.some_number</code> (1)</li> <li>thread2 calculates <code>some_number+1</code> (2)</li> <li>thread2 writes 2 to <code>self.some_number</code> </li> </ol> <h3>EDIT: Let's complete this answer with some code which shows the explained behaviour:</h3> <pre class="prettyprint"><code>import threading import time total = 0 lock = threading.Lock() def increment_n_times(n): global total for i in range(n): total += 1 def safe_increment_n_times(n): global total for i in range(n): lock.acquire() total += 1 lock.release() def increment_in_x_threads(x, func, n): threads = [threading.Thread(target=func, args=(n,)) for i in range(x)] global total total = 0 begin = time.time() for thread in threads: thread.start() for thread in threads: thread.join() print('finished in {}s.\ntotal: {}\nexpected: {}\ndifference: {} ({} %)' .format(time.time()-begin, total, n*x, n*x-total, 100-total/n/x*100)) </code></pre> There are two functions which implement increment. One uses locks and the other does not. Function <code>increment_in_x_threads</code> implements parallel execution of the incrementing function in many threads. Now running this with a big enough number of threads makes it almost certain that an error will occur: <pre class="prettyprint"><code>print('unsafe:') increment_in_x_threads(70, increment_n_times, 100000) print('\nwith locks:') increment_in_x_threads(70, safe_increment_n_times, 100000) </code></pre> In my case, it printed: <pre class="prettyprint"><code>unsafe: finished in 0.9840562343597412s. total: 4654584 expected: 7000000 difference: 2345416 (33.505942857142855 %) with locks: finished in 20.564176082611084s. total: 7000000 expected: 7000000 difference: 0 (0.0 %) </code></pre> So without locks, there were many errors (33% of increments failed). On the other hand, with locks it was 20 times slower. Of course, both numbers are blown up because I used 70 threads, but this shows the general idea.

Why do we need locks for threads, if we have GIL?

1 Answers

GIL protects the Python interals. That means:

you don't have to worry about something in the interpreter going wrong because of multithreading
most things do not really run in parallel, because python code is executed sequentially due to GIL

But GIL does not protect your own code. For example, if you have this code:

self.some_number += 1

That is going to read value of self.some_number, calculate some_number+1 and then write it back to self.some_number.

If you do that in two threads, the operations (read, add, write) of one thread and the other may be mixed, so that the result is wrong.

This could be the order of execution:

thread1 reads self.some_number (0)
thread2 reads self.some_number (0)
thread1 calculates some_number+1 (1)
thread2 calculates some_number+1 (1)
thread1 writes 1 to self.some_number
thread2 writes 1 to self.some_number

You use locks to enforce this order of execution:

thread1 reads self.some_number (0)
thread1 calculates some_number+1 (1)
thread1 writes 1 to self.some_number
thread2 reads self.some_number (1)
thread2 calculates some_number+1 (2)
thread2 writes 2 to self.some_number

EDIT: Let's complete this answer with some code which shows the explained behaviour:

import threading import time  total = 0 lock = threading.Lock()  def increment_n_times(n):     global total     for i in range(n):         total += 1  def safe_increment_n_times(n):     global total     for i in range(n):         lock.acquire()         total += 1         lock.release()  def increment_in_x_threads(x, func, n):     threads = [threading.Thread(target=func, args=(n,)) for i in range(x)]     global total     total = 0     begin = time.time()     for thread in threads:         thread.start()     for thread in threads:         thread.join()     print('finished in {}s.\ntotal: {}\nexpected: {}\ndifference: {} ({} %)'            .format(time.time()-begin, total, n*x, n*x-total, 100-total/n/x*100))

There are two functions which implement increment. One uses locks and the other does not.

Function increment_in_x_threads implements parallel execution of the incrementing function in many threads.

Now running this with a big enough number of threads makes it almost certain that an error will occur:

print('unsafe:') increment_in_x_threads(70, increment_n_times, 100000)  print('\nwith locks:') increment_in_x_threads(70, safe_increment_n_times, 100000)

In my case, it printed:

unsafe: finished in 0.9840562343597412s. total: 4654584 expected: 7000000 difference: 2345416 (33.505942857142855 %)  with locks: finished in 20.564176082611084s. total: 7000000 expected: 7000000 difference: 0 (0.0 %)

So without locks, there were many errors (33% of increments failed). On the other hand, with locks it was 20 times slower.

Of course, both numbers are blown up because I used 70 threads, but this shows the general idea.

answered Oct 21 '22 11:10

zvone

Related questions
                            
                                Angle between points?
                            
                                Python's equivalent for R's dput() function
                            
                                Python >=3.5: Checking type annotation at runtime
                            
                                104, 'Connection reset by peer' socket error, or When does closing a socket result in a RST rather than FIN?
                            
                                Plotting results of Pandas GroupBy
                            
                                Closest equivalent of a factor variable in Python Pandas
                            
                                What's the difference between pandas ACF and statsmodel ACF?
                            
                                PHP equivalent of Python's __name__ == "__main__"?
                            
                                How to Reduce the time taken to load a pickle file in python
                            
                                Jupyter (IPython) notebook: Convert an HTML notebook to ipynb
                            
                                Using subprocess.Popen for Process with Large Output
                            
                                Do I have to do StringIO.close()?
                            
                                How can I access a classmethod from inside a class in Python
                            
                                how to apply a mask from one array to another array?
                            
                                Fastest way to process a large file?
                            
                                Python metaclasses vs class decorators
                            
                                Python URLLib / URLLib2 POST
                            
                                Relative importing modules from parent folder subfolder
                            
                                What does the Python version line mean?
                            
                                How do I make pytest fixtures work with decorated functions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do we need locks for threads, if we have GIL?

Tags:

python

multithreading

Paul

People also ask

1 Answers

EDIT: Let's complete this answer with some code which shows the explained behaviour:

zvone

Recent Activity

Donate For Us