If I have an integer <code>i</code>, it is not safe to do <code>i += 1</code> on multiple threads: <pre class="prettyprint"><code>>>> i = 0 >>> def increment_i(): ... global i ... for j in range(1000): i += 1 ... >>> threads = [threading.Thread(target=increment_i) for j in range(10)] >>> for thread in threads: thread.start() ... >>> for thread in threads: thread.join() ... >>> i 4858 # Not 10000 </code></pre> However, if I have a list <code>l</code>, it does seem safe to do <code>l += [1]</code> on multiple threads: <pre class="prettyprint"><code>>>> l = [] >>> def extend_l(): ... global l ... for j in range(1000): l += [1] ... >>> threads = [threading.Thread(target=extend_l) for j in range(10)] >>> for thread in threads: thread.start() ... >>> for thread in threads: thread.join() ... >>> len(l) 10000 </code></pre> Is <code>l += [1]</code> guaranteed to be thread-safe? If so, does this apply to all Python implementations or just CPython? Edit: It seems that <code>l += [1]</code> is thread-safe but <code>l = l + [1]</code> is not... <pre class="prettyprint"><code>>>> l = [] >>> def extend_l(): ... global l ... for j in range(1000): l = l + [1] ... >>> threads = [threading.Thread(target=extend_l) for j in range(10)] >>> for thread in threads: thread.start() ... >>> for thread in threads: thread.join() ... >>> len(l) 3305 # Not 10000 </code></pre>

There isn't a happy ;-) answer to this. There's nothing guaranteed about any of it, which you can confirm simply by noting that the Python reference manual makes no guarantees about atomicity. In CPython it's a matter of pragmatics. As a snipped part of effbot's article says, <blockquote> In theory, this means an exact accounting requires an exact understanding of the PVM [Python Virtual Machine] bytecode implementation. </blockquote> And that's the truth. A CPython expert knows <code>L += [x]</code> is atomic because they know all of the following: <ul> <li> <code>+=</code> compiles to an <code>INPLACE_ADD</code> bytecode.</li> <li>The implementation of <code>INPLACE_ADD</code> for list objects is written entirely in C (no Python code is on the execution path, so the GIL can't be released between bytecodes).</li> <li>In <code>listobject.c</code>, the implementation of <code>INPLACE_ADD</code> is function <code>list_inplace_concat()</code>, and nothing during its execution needs to execute any user Python code either (if it did, the GIL may again be released).</li> </ul> That may all sound incredibly difficult to keep straight, but for someone with effbot's knowledge of CPython's internals (at the time he wrote that article), it really isn't. In fact, given that depth of knowledge, it's all kind of obvious ;-) So as a matter of pragmatics, CPython experts have always freely relied on that "operations that 'look atomic' should really be atomic", and that also guided some language decisions. For example, an operation missing from effbot's list (added to the language after he wrote that article): <pre class="prettyprint"><code>x = D.pop(y) # or ... x = D.pop(y, default) </code></pre> One argument (at the time) in favor of adding <code>dict.pop()</code> was precisely that the obvious C implementation would be atomic, whereas the in-use (at the time) alternative: <pre class="prettyprint"><code>x = D[y] del D[y] </code></pre> was not atomic (the retrieval and the deletion are done via distinct bytecodes, so threads can switch between them). But the docs never said <code>.pop()</code> was atomic, and never will. This is a "consenting adults" kind of thing: if you're expert enough to exploit this knowingly, you don't need hand-holding. If you're not expert enough, then the last sentence of effbot's article applies: <blockquote> When in doubt, use a mutex! </blockquote> As a matter of pragmatic necessity, core developers will never break the atomicity of effbot's examples (or of <code>D.pop()</code> or <code>D.setdefault()</code>) in CPython. Other implementations are under no obligation at all to mimic these pragmatic choices, though. Indeed, since atomicity in these cases relies on CPython's specific form of bytecode combined with CPython's use of a global interpreter lock that can only be released between bytecodes, it could be a real pain for other implementations to mimic them. And you never know: some future version of CPython may remove the GIL too! I doubt it, but it's theoretically possible. But if that happens, I bet a parallel version retaining the GIL will be maintained too, because a whole lot of code (especially extension modules written in <code>C</code>) relies on the GIL for thread safety too. Worth repeating: <blockquote> When in doubt, use a mutex! </blockquote>

Is extending a Python list (e.g. l += [1]) guaranteed to be thread-safe?

Tags:

If I have an integer i, it is not safe to do i += 1 on multiple threads:

>>> i = 0 >>> def increment_i(): ...     global i ...     for j in range(1000): i += 1 ... >>> threads = [threading.Thread(target=increment_i) for j in range(10)] >>> for thread in threads: thread.start() ... >>> for thread in threads: thread.join() ... >>> i 4858  # Not 10000

However, if I have a list l, it does seem safe to do l += [1] on multiple threads:

>>> l = [] >>> def extend_l(): ...     global l ...     for j in range(1000): l += [1] ... >>> threads = [threading.Thread(target=extend_l) for j in range(10)] >>> for thread in threads: thread.start() ... >>> for thread in threads: thread.join() ... >>> len(l) 10000

Is l += [1] guaranteed to be thread-safe? If so, does this apply to all Python implementations or just CPython?

Edit: It seems that l += [1] is thread-safe but l = l + [1] is not...

>>> l = [] >>> def extend_l(): ...     global l ...     for j in range(1000): l = l + [1] ... >>> threads = [threading.Thread(target=extend_l) for j in range(10)] >>> for thread in threads: thread.start() ... >>> for thread in threads: thread.join() ... >>> len(l) 3305  # Not 10000

977

asked Jul 08 '16 12:07

user200783

1 Answers

There isn't a happy ;-) answer to this. There's nothing guaranteed about any of it, which you can confirm simply by noting that the Python reference manual makes no guarantees about atomicity.

In CPython it's a matter of pragmatics. As a snipped part of effbot's article says,

In theory, this means an exact accounting requires an exact understanding of the PVM [Python Virtual Machine] bytecode implementation.

And that's the truth. A CPython expert knows L += [x] is atomic because they know all of the following:

+= compiles to an INPLACE_ADD bytecode.
The implementation of INPLACE_ADD for list objects is written entirely in C (no Python code is on the execution path, so the GIL can't be released between bytecodes).
In listobject.c, the implementation of INPLACE_ADD is function list_inplace_concat(), and nothing during its execution needs to execute any user Python code either (if it did, the GIL may again be released).

That may all sound incredibly difficult to keep straight, but for someone with effbot's knowledge of CPython's internals (at the time he wrote that article), it really isn't. In fact, given that depth of knowledge, it's all kind of obvious ;-)

So as a matter of pragmatics, CPython experts have always freely relied on that "operations that 'look atomic' should really be atomic", and that also guided some language decisions. For example, an operation missing from effbot's list (added to the language after he wrote that article):

x = D.pop(y) # or ... x = D.pop(y, default)

One argument (at the time) in favor of adding dict.pop() was precisely that the obvious C implementation would be atomic, whereas the in-use (at the time) alternative:

x = D[y] del D[y]

was not atomic (the retrieval and the deletion are done via distinct bytecodes, so threads can switch between them).

But the docs never said .pop() was atomic, and never will. This is a "consenting adults" kind of thing: if you're expert enough to exploit this knowingly, you don't need hand-holding. If you're not expert enough, then the last sentence of effbot's article applies:

When in doubt, use a mutex!

As a matter of pragmatic necessity, core developers will never break the atomicity of effbot's examples (or of D.pop() or D.setdefault()) in CPython. Other implementations are under no obligation at all to mimic these pragmatic choices, though. Indeed, since atomicity in these cases relies on CPython's specific form of bytecode combined with CPython's use of a global interpreter lock that can only be released between bytecodes, it could be a real pain for other implementations to mimic them.

And you never know: some future version of CPython may remove the GIL too! I doubt it, but it's theoretically possible. But if that happens, I bet a parallel version retaining the GIL will be maintained too, because a whole lot of code (especially extension modules written in C) relies on the GIL for thread safety too.

Worth repeating:

When in doubt, use a mutex!

answered Sep 21 '22 09:09

Tim Peters

Related questions
                            
                                What is Codecov score and how it is measured?
                            
                                0 is 0 == 0 (#evaluates to True?) [duplicate]
                            
                                Accessing redux store inside functions
                            
                                Unable to edit db entries using EFCore, EntityState.Modified: "Database operation expected to affect 1 row(s) but actually affected 0 row(s)."
                            
                                IPFS: How to add a file to an existing folder?
                            
                                Spring boot app vs .war file deployed on Tomcat/Jetty [closed]
                            
                                Why is binary_crossentropy more accurate than categorical_crossentropy for multiclass classification in Keras?
                            
                                In C++, does the scope of a named parameter include the expression for its default value?
                            
                                PutItem in DynamoDB table by CloudFormation
                            
                                how to provide environment variables to AWS ECS task definition?
                            
                                Cannot connect the virtual device sata0:1 because no corresponding device is available on the host [closed]
                            
                                My Docker container does not have IP address. Why?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With