I want to test if it's ok to append to list from two threads, but I'm getting messy output:
import threading
class myThread(threading.Thread):
def __init__(self, name, alist):
threading.Thread.__init__(self)
self.alist = alist
def run(self):
print "Starting " + self.name
append_to_list(self.alist, 2)
print "Exiting " + self.name
print self.alist
def append_to_list(alist, counter):
while counter:
alist.append(alist[-1]+1)
counter -= 1
alist = [1, 2]
# Create new threads
thread1 = myThread("Thread-1", alist)
thread2 = myThread("Thread-2", alist)
# Start new Threads
thread1.start()
thread2.start()
print "Exiting Main Thread"
print alist
So the output is:
Starting Thread-1
Exiting Thread-1
Starting Thread-2
Exiting Main Thread
Exiting Thread-2
[1[1, 2[, 1, 2, 23, , 34, 5, 6, ]4
, 5, , 3, 64, 5, ]6]
Why it's so messy and alist not equal to [1,2,3,4,5,6]?
Although list under multithreading is thread-unsafe, it is thread-safe under the operation of append .
We can safely append to a file from multiple threads using a mutual exclusion lock. Python provides a mutual exclusion lock, also called a mutex, via the threading. Lock class. First, we can create an instance of the lock to be shared by all threads.
Operations like assignment and adding values to a list or a dict in Python are atomic. In this tutorial you will discover thread atomic operations in Python. Let's get started.
Create an empty List. It implements List interface. It is a thread-safe variant of ArrayList. T represents generic.
Summary
Why is the output messy?
==> Because a thread may yield part way through executing a print
statement
Why is aList
not equal to [1, 2, 3, 4, 5, 6]?
==> Because the content of aList
may change between reading from it and appending
to it.
Output
The output is messy because it is being produced by python2's print
statement
from within threads, and the print
statement is not threadsafe. This means
that a thread may yield while print
is executing. In the code in the
question there multiple threads printing, so one thread may yield while
printing, the other thread may start printing and then yield so producing the
interleaved output seen by the OP. IO operations such as writing to stdout
are very slow in CPU terms, so it's quite likely that the operating system may
pause a thread performing IO because thread is waiting on the hardware to do
something.
For example, this code:
import threading
def printer():
for i in range(2):
print ['foo', 'bar', 'baz']
def main():
threads = [threading.Thread(target=printer) for x in xrange(2)]
for t in threads:
t.start()
for t in threads:
t.join()
produces this interleaved output:
>>> main()
['foo', 'bar'['foo', , 'bar', 'baz']
'baz']
['foo', ['foo', 'bar''bar', 'baz']
, 'baz']
The interleaving behaviour can be prevented by using a lock
:
def printer():
for i in range(2):
with lock:
print ['foo', 'bar', 'baz']
def main():
global lock
lock = threading.Lock()
threads = [threading.Thread(target=printer) for x in xrange(2)]
for t in threads:
t.start()
for t in threads:
t.join()
>>> main()
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
The contents of the list
The final content of aList
will be [1, 2, 3, 4, 5, 6]
if the statement
aList.append(aList[-1] + 1)
is executed atomically, that is without the current thread yielding to another
thread which is also reading from and appending to aList
.
However this not how threads work. A thread may yield after reading
the last element from aList
or incrementing the value, so it is quite
possible to have a sequence of event like this:
2
from aList
2
from aList
, then appends 3
3
from aList
, then appends 4
3
3
from aList
, then appends 4
This leaves aList
as [1, 2, 3, 4, 3, 4]
As with the print
statements, this can be prevented by making threads acquire a lock
before executing aList.append(aList[-1] + 1)
(Note that the list.append
method is threadsafe in pure python code, so there is no risk that the value being appended could be corrupted.)
EDIT: @kroltan got me to thinking some more, and i think your example is in fact more threadsafe then i originally thought. The issue is not in the multiple writer threads in total, it's specifically in this line:
alist.append(alist[-1]+1)
There's no guarantee that the append
will happen directly after the alist[-1]
completes, other operations may be interleaved.
With a detailed explanation here: http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm
Operations that replace other objects may invoke those other objects’ del method when their reference count reaches zero, and that can affect things. This is especially true for the mass updates to dictionaries and lists. When in doubt, use a mutex!
Original Answer:
This is undefined behavior, as you have multiple threads writing to the same bit of memory - hence the "messy" output your observing.
I want to test if it's ok to append to list from two threads, but I'm getting messy output
I think you've successfully tested this, and the answer is No. Lots of more detailed explanations on SO: https://stackoverflow.com/a/5943027/62032
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With