Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I need a lock block to iterate on a shared list?

Tags:

python

I know that python lists are thread safe. However, let's suppose I have a shared list

object_list = []

Now, a thread periodically adds an object (a data structure) to this list,

object_list.add(aNewObject)

Another thread, when invoked, accesses this list in a for loop

for i in object_list:
    #do something with i

Do I need something like this:

myLock.acquire()
either add an element or access the list with the for
myLock.release()

Or I am fine anyway? I don't care about missing an add, but I am worried about what happens if a new element is added when the reading thread is iterating in the middle of the list...

like image 315
Phate Avatar asked Mar 15 '14 10:03

Phate


1 Answers

It depends on what guarantees you expect. First of all, the python language does not guarantee any thread-safety.

The CPython implementation incidentally offer a bit of thread-safety due to its GIL, but if you ever plan to run the code under any other python implementation you should just use locks all over the place whenever you access the resource.

Now, for a CPython specific consideration, if you know that the code will add elements only via append or extend, then the worst that can happen is that the for loop will iterate also over the new elements. This might or might not be something you want1:

>>> a = [0]
>>> for i, el in enumerate(a):
...     if i % 2 == 0:
...             a.append(i)
...     print(i, el)
... 
0 0
1 0
>>> a
[0, 0]
>>> for i, el in enumerate(a):
...     if i % 2 == 0:
...             a.append(i)
...     print(i, el)
... 
0 0
1 0
2 0
3 2
>>> a
[0, 0, 0, 2]

If new items are added using the insert method then it's possible that some values are iterated over more then once, while some others aren't iterated over during the for. For example:

>>> a = [1]
>>> for i,el in enumerate(a):
...     if i % 2 == 0:
...             a.insert(0, i)
...     print(i, el)
... 
(0, 1)
(1, 1)
>>> a
[0, 1]

Here the 1 is iterated over twice and the newly inserted 0 is never seen in the loop.

Also removing elements from a list can result in similar behaviour where some elements are skipped:

>>> for i, el in enumerate(a):
...     if i % 2 == 0:
...             a.remove(i)
...     print(i, el)
... 
(0, 0)
>>> a
[1]

As you can see the 1 is never seen during the loop.

This said mutating a sequence during iteration is usually something you don't want. In your case I'd recommend using a lock. Note that locks are context managers so I'd write the loop as:

with list_lock:
    for element in object_list:
        # do stuff

This simplifies the code since you don't have to remember to release the lock explicitly. The same should be done when inserting the element (otherwise it's useless):

with list_lock:
    object_list.append(new_element)

1 I don't use separate threads to simplify the example and make it easily reproducible. In CPython having the operations done in multiple threads shouldn't change the behaviour you see. This is not true in other implementations. Obviously with multiple threads is much harder to reproduce the behaviour.

like image 139
Bakuriu Avatar answered Oct 14 '22 05:10

Bakuriu