The complexity of <code>len()</code> with regards to sets and lists is equally O(1). How come it takes more time to process sets? <pre class="prettyprint"><code>~$ python -m timeit "a=[1,2,3,4,5,6,7,8,9,10];len(a)" 10000000 loops, best of 3: 0.168 usec per loop ~$ python -m timeit "a={1,2,3,4,5,6,7,8,9,10};len(a)" 1000000 loops, best of 3: 0.375 usec per loop </code></pre> Is it related to the particular benchmark, as in, it takes more time to build sets than lists and the benchmark takes that into account as well? If the creation of a set object takes more time compared to creating a list, what would be the underlying reason?

Firstly, you have not measured the speed of <code>len()</code>, you have measured the speed of creating a list/set together with the speed of <code>len()</code>. Use the <code>--setup</code> argument of <code>timeit</code>: <pre class="prettyprint"><code>$ python -m timeit --setup "a=[1,2,3,4,5,6,7,8,9,10]" "len(a)" 10000000 loops, best of 3: 0.0369 usec per loop $ python -m timeit --setup "a={1,2,3,4,5,6,7,8,9,10}" "len(a)" 10000000 loops, best of 3: 0.0372 usec per loop </code></pre> The statements you pass to <code>--setup</code> are run before measuring the speed of <code>len()</code>. Secondly, you should note that <code>len(a)</code> is a pretty quick statement. The process of measuring its speed may be subject to "noise". Consider that the code executed (and measured) by timeit is equivalent to the following: <pre class="prettyprint"><code>for i in itertools.repeat(None, number): len(a) </code></pre> Because both <code>len(a)</code> and <code>itertools.repeat(...).__next__()</code> are fast operations and their speeds may be similar, the speed of <code>itertools.repeat(...).__next__()</code> may influence the timings. For this reason, you'd better measure <code>len(a); len(a); ...; len(a)</code> (repeated 100 times or so) so that the body of the for loop takes a considerably higher amount of time than the iterator: <pre class="prettyprint"><code>$ python -m timeit --setup "a=[1,2,3,4,5,6,7,8,9,10]" "$(for i in {0..1000}; do echo "len(a)"; done)" 10000 loops, best of 3: 29.2 usec per loop $ python -m timeit --setup "a={1,2,3,4,5,6,7,8,9,10}" "$(for i in {0..1000}; do echo "len(a)"; done)" 10000 loops, best of 3: 29.3 usec per loop </code></pre> (The results still says that <code>len()</code> has the same performances on lists and sets, but now you are sure that the result is correct.) Thirdly, it's true that "complexity" and "speed" are related, but I believe you are making some confusion. The fact that <code>len()</code> has O(1) complexity for lists and sets does not imply that it must run with the same speed on lists and sets. It means that, on average, no matter how long the list <code>a</code> is, <code>len(a)</code> performs the same asymptotic number of steps. And no matter how long the set <code>b</code> is, <code>len(b)</code> performs the same asymptotic number of steps. But the algorithm for computing the size of lists and sets may be different, resulting in different performances (timeit shows that this is not the case, however this may be a possibility). Lastly, <blockquote> If the creation of a set object takes more time compared to creating a list, what would be the underlying reason? </blockquote> A set, as you know, does not allow repeated elements. Sets in CPython are implemented as hash tables (to ensure average O(1) insertion and lookup): constructing and maintaining a hash table is much more complex than adding elements to a list. Specifically, when constructing a set, you have to compute hashes, build the hash table, look it up to avoid inserting duplicated events and so on. By contrast, lists in CPython are implemented as a simple array of pointers that is <code>malloc()</code>ed and <code>realloc()</code>ed as required.

The relevant lines are http://svn.python.org/view/python/trunk/Objects/setobject.c?view=markup#l640 <pre class="prettyprint"><code>640 static Py_ssize_t 641 set_len(PyObject *so) 642 { 643 return ((PySetObject *)so)->used; 644 } </code></pre> and http://svn.python.org/view/python/trunk/Objects/listobject.c?view=markup#l431 <pre class="prettyprint"><code>431 static Py_ssize_t 432 list_length(PyListObject *a) 433 { 434 return Py_SIZE(a); 435 } </code></pre> Both are only a static lookup. So what is the difference you may ask. You measure the creation of the objects, too. And it is a little more time consuming to create a set than a list.

Complexity of len() with regard to sets and lists

Tags:

python

time-complexity

python-3.x

python-internals

The complexity of len() with regards to sets and lists is equally O(1). How come it takes more time to process sets?

~$ python -m timeit "a=[1,2,3,4,5,6,7,8,9,10];len(a)" 10000000 loops, best of 3: 0.168 usec per loop ~$ python -m timeit "a={1,2,3,4,5,6,7,8,9,10};len(a)" 1000000 loops, best of 3: 0.375 usec per loop

Is it related to the particular benchmark, as in, it takes more time to build sets than lists and the benchmark takes that into account as well?

If the creation of a set object takes more time compared to creating a list, what would be the underlying reason?

710

asked Aug 27 '15 12:08

Omid

2 Answers

Firstly, you have not measured the speed of len(), you have measured the speed of creating a list/set together with the speed of len().

Use the --setup argument of timeit:

$ python -m timeit --setup "a=[1,2,3,4,5,6,7,8,9,10]" "len(a)" 10000000 loops, best of 3: 0.0369 usec per loop $ python -m timeit --setup "a={1,2,3,4,5,6,7,8,9,10}" "len(a)" 10000000 loops, best of 3: 0.0372 usec per loop

The statements you pass to --setup are run before measuring the speed of len().

Secondly, you should note that len(a) is a pretty quick statement. The process of measuring its speed may be subject to "noise". Consider that the code executed (and measured) by timeit is equivalent to the following:

for i in itertools.repeat(None, number):     len(a)

Because both len(a) and itertools.repeat(...).__next__() are fast operations and their speeds may be similar, the speed of itertools.repeat(...).__next__() may influence the timings.

For this reason, you'd better measure len(a); len(a); ...; len(a) (repeated 100 times or so) so that the body of the for loop takes a considerably higher amount of time than the iterator:

$ python -m timeit --setup "a=[1,2,3,4,5,6,7,8,9,10]" "$(for i in {0..1000}; do echo "len(a)"; done)" 10000 loops, best of 3: 29.2 usec per loop $ python -m timeit --setup "a={1,2,3,4,5,6,7,8,9,10}" "$(for i in {0..1000}; do echo "len(a)"; done)" 10000 loops, best of 3: 29.3 usec per loop

(The results still says that len() has the same performances on lists and sets, but now you are sure that the result is correct.)

Thirdly, it's true that "complexity" and "speed" are related, but I believe you are making some confusion. The fact that len() has O(1) complexity for lists and sets does not imply that it must run with the same speed on lists and sets.

It means that, on average, no matter how long the list a is, len(a) performs the same asymptotic number of steps. And no matter how long the set b is, len(b) performs the same asymptotic number of steps. But the algorithm for computing the size of lists and sets may be different, resulting in different performances (timeit shows that this is not the case, however this may be a possibility).

Lastly,

If the creation of a set object takes more time compared to creating a list, what would be the underlying reason?

A set, as you know, does not allow repeated elements. Sets in CPython are implemented as hash tables (to ensure average O(1) insertion and lookup): constructing and maintaining a hash table is much more complex than adding elements to a list.

Specifically, when constructing a set, you have to compute hashes, build the hash table, look it up to avoid inserting duplicated events and so on. By contrast, lists in CPython are implemented as a simple array of pointers that is malloc()ed and realloc()ed as required.

129

answered Oct 13 '22 12:10

Andrea Corbellini

The relevant lines are http://svn.python.org/view/python/trunk/Objects/setobject.c?view=markup#l640

640     static Py_ssize_t 641     set_len(PyObject *so) 642     { 643         return ((PySetObject *)so)->used; 644     }

and http://svn.python.org/view/python/trunk/Objects/listobject.c?view=markup#l431

431     static Py_ssize_t 432     list_length(PyListObject *a) 433     { 434         return Py_SIZE(a); 435     }

Both are only a static lookup.

So what is the difference you may ask. You measure the creation of the objects, too. And it is a little more time consuming to create a set than a list.

answered Oct 13 '22 14:10

kay

Related questions
                            
                                Why Anaconda does not recognize conda command?
                            
                                How do you extract a JAR in a UNIX filesystem with a single command and specify its target directory using the JAR command?
                            
                                Writing List of Strings to Excel CSV File in Python
                            
                                Computing an md5 hash of a data structure
                            
                                matplotlib.pyplot, preserve aspect ratio of the plot
                            
                                Pause in Python
                            
                                Implementation HMAC-SHA1 in python
                            
                                opencv.imshow will cause jupyter notebook crash
                            
                                NotImplementedError: Cannot convert a symbolic Tensor (2nd_target:0) to a numpy array
                            
                                Error when calling the metaclass bases: function() argument 1 must be code, not str
                            
                                Pyspark replace strings in Spark dataframe column
                            
                                How to make an axes occupy multiple subplots with pyplot (Python)
                            
                                List of all unique characters in a string?
                            
                                Schedule a repeating event in Python 3
                            
                                Explain the aggregate functionality in Spark (with Python and Scala)
                            
                                create & read from tempfile
                            
                                Python equivalent for #ifdef DEBUG
                            
                                VSCode pytest test discovery fails
                            
                                Adding folders to a zip file using python
                            
                                How to share the global app object in flask?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With