A discussion following this question left me wondering, so I decided to run a few tests and compare the creation time of <code>set((x,y,z))</code> vs. <code>{x,y,z}</code> for creating sets in Python (I'm using Python 3.7). I compared the two methods using <code>time</code> and <code>timeit</code>. Both were consistent* with the following results: <pre class="prettyprint"><code>test1 = """ my_set1 = set((1, 2, 3)) """ print(timeit(test1)) </code></pre> Result: 0.30240735499999993 <pre class="prettyprint"><code>test2 = """ my_set2 = {1,2,3} """ print(timeit(test2)) </code></pre> Result: 0.10771795900000003 So the second method was almost 3 times faster than the first. This was quite a surprising difference to me. What is happening under the hood to optimize the performance of the set literal over the <code>set()</code> method in such a way? Which would be advisable for which cases? * Note: I only show the results of the <code>timeit</code> tests since they are averaged over many samples, and thus perhaps more reliable, but the results when testing with <code>time</code> showed similar differences in both cases. <hr> Edit: I'm aware of this similar question and though it answers certain aspects of my original question, it didn't cover all of it. Sets were not addressed in the question, and as empty sets do not have a literal syntax in python, I was curious how (if at all) set creation using a literal would differ from using the <code>set()</code> method. Also, I wondered how the handling of the tuple parameter in <code>set((x,y,z)</code> happens behind the scenes and what is its possible impact on runtime. The great answer by coldspeed helped clear things up.

(This is in response to code that has now been edited out of the initial question) You forgot to call the functions in the second case. Making the appropriate modifications, the results are as expected: <pre class="prettyprint"><code>test1 = """ def foo1(): my_set1 = set((1, 2, 3)) foo1() """ timeit(test1) # 0.48808742000255734 </code></pre> <pre class="prettyprint"><code>test2 = """ def foo2(): my_set2 = {1,2,3} foo2() """ timeit(test2) # 0.3064506609807722 </code></pre> <hr> Now, the reason for the difference in timings is because <code>set()</code> is a function call requiring a lookup into the symbol table, whereas the <code>{...}</code> set construction is an artefact of the syntax, and is much faster. The difference is obvious when observing the disassembled byte code. <pre class="prettyprint"><code>import dis dis.dis("set((1, 2, 3))") 1 0 LOAD_NAME 0 (set) 2 LOAD_CONST 3 ((1, 2, 3)) 4 CALL_FUNCTION 1 6 RETURN_VALUE </code></pre> <pre class="prettyprint"><code>dis.dis("{1, 2, 3}") 1 0 LOAD_CONST 0 (1) 2 LOAD_CONST 1 (2) 4 LOAD_CONST 2 (3) 6 BUILD_SET 3 8 RETURN_VALUE </code></pre> In the first case, a function call is made by the instruction <code>CALL_FUNCTION</code> on the tuple <code>(1, 2, 3)</code> (which also comes with its own overhead, although minor—it is loaded as a constant via <code>LOAD_CONST</code>), whereas in the second instruction is just a <code>BUILD_SET</code> call, which is more efficient. Re: your question regarding the time taken for tuple construction, we see this is actually negligible: <pre class="prettyprint"><code>timeit("""(1, 2, 3)""") # 0.01858693000394851 timeit("""{1, 2, 3}""") # 0.11971827200613916 </code></pre> Tuples are immutable, so the compiler optimises this operation by loading it as a constant—this is called constant folding (you can see this clearly from the <code>LOAD_CONST</code> instruction above), so the time taken is negligible. This is not seen with sets are they are mutable (Thanks to @user2357112 for pointing this out). <hr> For larger sequences, we see similar behaviour. <code>{..}</code> syntax is faster at constructing sets using set comprehensions as opposed to <code>set()</code> which has to build the set from a generator. <pre class="prettyprint"><code>timeit("""set(i for i in range(10000))""", number=1000) # 0.9775058150407858 timeit("""{i for i in range(10000)}""", number=1000) # 0.5508635920123197 </code></pre> For reference, you can also use iterable unpacking on more recent versions: <pre class="prettyprint"><code>timeit("""{*range(10000)}""", number=1000) # 0.7462548640323803 </code></pre> Interestingly, however, <code>set()</code> is faster when called directly on <code>range</code>: <pre class="prettyprint"><code>timeit("""set(range(10000))""", number=1000) # 0.3746800610097125 </code></pre> This happens to be faster than the set construction. You will see similar behaviour for other sequences (such as <code>list</code>s). My recommendation would be to use the <code>{...}</code> set comprehension when constructing set literals, and as an alternative to passing a generator comprehension to <code>set()</code>; and instead use <code>set()</code> to convert an existing sequence/iterable to a set.

Python performance comparison for creating sets - set() vs. {} literal [duplicate]

Tags:

performance

python

python-3.x

set

A discussion following this question left me wondering, so I decided to run a few tests and compare the creation time of set((x,y,z)) vs. {x,y,z} for creating sets in Python (I'm using Python 3.7).

I compared the two methods using time and timeit. Both were consistent* with the following results:

test1 = """
my_set1 = set((1, 2, 3))
"""
print(timeit(test1))

Result: 0.30240735499999993

test2 = """
my_set2 = {1,2,3}
"""
print(timeit(test2))

Result: 0.10771795900000003

So the second method was almost 3 times faster than the first. This was quite a surprising difference to me. What is happening under the hood to optimize the performance of the set literal over the set() method in such a way? Which would be advisable for which cases?

* Note: I only show the results of the timeit tests since they are averaged over many samples, and thus perhaps more reliable, but the results when testing with time showed similar differences in both cases.

Edit: I'm aware of this similar question and though it answers certain aspects of my original question, it didn't cover all of it. Sets were not addressed in the question, and as empty sets do not have a literal syntax in python, I was curious how (if at all) set creation using a literal would differ from using the set() method. Also, I wondered how the handling of the tuple parameter in set((x,y,z) happens behind the scenes and what is its possible impact on runtime. The great answer by coldspeed helped clear things up.

847

asked Dec 30 '18 13:12

yuvgin

1 Answers

(This is in response to code that has now been edited out of the initial question) You forgot to call the functions in the second case. Making the appropriate modifications, the results are as expected:

test1 = """
def foo1():
     my_set1 = set((1, 2, 3))
foo1()
"""    
timeit(test1)
# 0.48808742000255734

test2 = """
def foo2():
    my_set2 = {1,2,3}
foo2()
"""    
timeit(test2)
# 0.3064506609807722

Now, the reason for the difference in timings is because set() is a function call requiring a lookup into the symbol table, whereas the {...} set construction is an artefact of the syntax, and is much faster.

The difference is obvious when observing the disassembled byte code.

import dis

dis.dis("set((1, 2, 3))")
  1           0 LOAD_NAME                0 (set)
              2 LOAD_CONST               3 ((1, 2, 3))
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

dis.dis("{1, 2, 3}")
  1           0 LOAD_CONST               0 (1)
              2 LOAD_CONST               1 (2)
              4 LOAD_CONST               2 (3)
              6 BUILD_SET                3
              8 RETURN_VALUE

In the first case, a function call is made by the instruction CALL_FUNCTION on the tuple (1, 2, 3) (which also comes with its own overhead, although minor—it is loaded as a constant via LOAD_CONST), whereas in the second instruction is just a BUILD_SET call, which is more efficient.

Re: your question regarding the time taken for tuple construction, we see this is actually negligible:

timeit("""(1, 2, 3)""")
# 0.01858693000394851

timeit("""{1, 2, 3}""")
# 0.11971827200613916

Tuples are immutable, so the compiler optimises this operation by loading it as a constant—this is called constant folding (you can see this clearly from the LOAD_CONST instruction above), so the time taken is negligible. This is not seen with sets are they are mutable (Thanks to @user2357112 for pointing this out).

For larger sequences, we see similar behaviour. {..} syntax is faster at constructing sets using set comprehensions as opposed to set() which has to build the set from a generator.

timeit("""set(i for i in range(10000))""", number=1000)
# 0.9775058150407858

timeit("""{i for i in range(10000)}""", number=1000)
# 0.5508635920123197

For reference, you can also use iterable unpacking on more recent versions:

timeit("""{*range(10000)}""", number=1000)
# 0.7462548640323803

Interestingly, however, set() is faster when called directly on range:

timeit("""set(range(10000))""", number=1000)
# 0.3746800610097125

This happens to be faster than the set construction. You will see similar behaviour for other sequences (such as lists).

My recommendation would be to use the {...} set comprehension when constructing set literals, and as an alternative to passing a generator comprehension to set(); and instead use set() to convert an existing sequence/iterable to a set.

188

answered Oct 17 '22 05:10

cs95

Related questions
                            
                                Error: I18nextWithTranslation suspended while rendering, but no fallback UI was specified
                            
                                ValueError: Can't convert non-rectangular Python sequence to Tensor
                            
                                When testing, code that causes React state updates should be wrapped into act
                            
                                How to disable @typescript-eslint/explicit-function-return-type for some(), filter(), forEach()?
                            
                                In Visual Studio Code, how to pass arguments in launch.json
                            
                                Tensorflow2 warning using @tffunction
                            
                                Can I automatically enable APIs when using GCP cloud with terraform?
                            
                                What is the difference between [pwsh] and [Powershell Integrated Console] on VS Code?
                            
                                App crashes during run-time after updating to Android Studio 3.6
                            
                                How to use MUI Select with react-hook-form?
                            
                                Flutter v2.5.0 Android Splash Screen
                            
                                Howto Enable Font Antialiasing in Windows [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With