In the following code, I create two lists with the same values: one list unsorted (s_not), the other sorted (s_yes). The values are created by randint(). I run some loop for each list and time it. <pre class="prettyprint"><code>import random import time for x in range(1,9): r = 10**x # do different val for the bound in randint() m = int(r/2) print("For rand", r) # s_not is non sorted list s_not = [random.randint(1,r) for i in range(10**7)] # s_yes is sorted s_yes = sorted(s_not) # do some loop over the sorted list start = time.time() for i in s_yes: if i > m: _ = 1 else: _ = 1 end = time.time() print("yes", end-start) # do the same to the unsorted list start = time.time() for i in s_not: if i > m: _ = 1 else: _ = 1 end = time.time() print("not", end-start) print() </code></pre> With output: <pre class="prettyprint"><code>For rand 10 yes 1.0437555313110352 not 1.1074268817901611 For rand 100 yes 1.0802974700927734 not 1.1524150371551514 For rand 1000 yes 2.5082249641418457 not 1.129960298538208 For rand 10000 yes 3.145440101623535 not 1.1366300582885742 For rand 100000 yes 3.313387393951416 not 1.1393756866455078 For rand 1000000 yes 3.3180911540985107 not 1.1336982250213623 For rand 10000000 yes 3.3231537342071533 not 1.13503098487854 For rand 100000000 yes 3.311596393585205 not 1.1345293521881104 </code></pre> So, when increasing the bound in the randint(), the loop over the sorted list gets slower. Why?

Cache misses. When <code>N</code> int objects are allocated back-to-back, the memory reserved to hold them tends to be in a contiguous chunk. So crawling over the list in allocation order tends to access the memory holding the ints' values in sequential, contiguous, increasing order too. Shuffle it, and the access pattern when crawling over the list is randomized too. Cache misses abound, provided there are enough different int objects that they don't all fit in cache. At <code>r==1</code>, and <code>r==2</code>, CPython happens to treat such small ints as singletons, so, e.g., despite that you have 10 million elements in the list, at <code>r==2</code> it contains only (at most) 100 distinct int objects. All the data for those fit in cache simultaneously. Beyond that, though, you're likely to get more, and more, and more distinct int objects. Hardware caches become increasingly useless then when the access pattern is random. Illustrating: <pre class="prettyprint lang-py prettyprint-override"><code>>>> from random import randint, seed >>> seed(987987987) >>> for x in range(1, 9): ... r = 10 ** x ... js = [randint(1, r) for _ in range(10_000_000)] ... unique = set(map(id, js)) ... print(f"{r:12,} {len(unique):12,}") ... 10 10 100 100 1,000 7,440,909 10,000 9,744,400 100,000 9,974,838 1,000,000 9,997,739 10,000,000 9,999,908 100,000,000 9,999,998 </code></pre>

Why is Python list slower when sorted?

Tags:

python

list

caching

In the following code, I create two lists with the same values: one list unsorted (s_not), the other sorted (s_yes). The values are created by randint(). I run some loop for each list and time it.

import random
import time

for x in range(1,9):

    r = 10**x # do different val for the bound in randint()
    m = int(r/2)

    print("For rand", r)

    # s_not is non sorted list
    s_not = [random.randint(1,r) for i in range(10**7)]

    # s_yes is sorted
    s_yes = sorted(s_not)

    # do some loop over the sorted list
    start = time.time()
    for i in s_yes:
        if i > m:
            _ = 1
        else:
            _ = 1
    end = time.time()
    print("yes", end-start)

    # do the same to the unsorted list
    start = time.time()
    for i in s_not:
        if i > m:
            _ = 1
        else:
            _ = 1
    end = time.time()
    print("not", end-start)

    print()

With output:

For rand 10
yes 1.0437555313110352
not 1.1074268817901611

For rand 100
yes 1.0802974700927734
not 1.1524150371551514

For rand 1000
yes 2.5082249641418457
not 1.129960298538208

For rand 10000
yes 3.145440101623535
not 1.1366300582885742

For rand 100000
yes 3.313387393951416
not 1.1393756866455078

For rand 1000000
yes 3.3180911540985107
not 1.1336982250213623

For rand 10000000
yes 3.3231537342071533
not 1.13503098487854

For rand 100000000
yes 3.311596393585205
not 1.1345293521881104

So, when increasing the bound in the randint(), the loop over the sorted list gets slower. Why?

541

asked Nov 12 '21 23:11

fdireito

Video Answer

3 Answers

Cache misses. When N int objects are allocated back-to-back, the memory reserved to hold them tends to be in a contiguous chunk. So crawling over the list in allocation order tends to access the memory holding the ints' values in sequential, contiguous, increasing order too.

Shuffle it, and the access pattern when crawling over the list is randomized too. Cache misses abound, provided there are enough different int objects that they don't all fit in cache.

At r==1, and r==2, CPython happens to treat such small ints as singletons, so, e.g., despite that you have 10 million elements in the list, at r==2 it contains only (at most) 100 distinct int objects. All the data for those fit in cache simultaneously.

Beyond that, though, you're likely to get more, and more, and more distinct int objects. Hardware caches become increasingly useless then when the access pattern is random.

Illustrating:

>>> from random import randint, seed
>>> seed(987987987)
>>> for x in range(1, 9):
...     r = 10 ** x
...     js = [randint(1, r) for _ in range(10_000_000)]
...     unique = set(map(id, js))
...     print(f"{r:12,} {len(unique):12,}")
...     
          10           10
         100          100
       1,000    7,440,909
      10,000    9,744,400
     100,000    9,974,838
   1,000,000    9,997,739
  10,000,000    9,999,908
 100,000,000    9,999,998

149

answered Oct 10 '22 06:10

Tim Peters

The answer is likely locality of data. Integers above a certain size limit are allocated dynamically. When you create the list, the integer objects are allocated from (mostly) nearby memory. So when you loop through the list, things tend to be in cache and the hardware prefetcher can put them there.

In the sorted case, the objects get shuffled around, resulting in more cache misses.

answered Oct 10 '22 05:10

Homer512

As the others said, cache misses. Not the values/sortedness. The same sorted values, but with freshly sequentially created objects, is fast again (actually even a bit faster than the not case):

s_new = [--x for x in s_yes]

Just picking one size:

For rand 1000000
yes 3.6270992755889893
not 1.198620080947876
new 1.02010178565979

Looking at address differences from one element to the next (just 10⁶ elements) shows that especially for s_new, the elements are nicely sequentially arranged in memory (99.2% of the time the next element came 32 bytes later), while for s_yes they're totally not (just 0.01% came 32 bytes later):

s_yes:
    741022 different address differences occurred. Top 5:
    Address difference 32 occurred 102 times.
    Address difference 0 occurred 90 times.
    Address difference 64 occurred 37 times.
    Address difference 96 occurred 17 times.
    Address difference 128 occurred 9 times.

s_not:
    1048 different address differences occurred. Top 5:
    Address difference 32 occurred 906649 times.
    Address difference 96 occurred 8931 times.
    Address difference 64 occurred 1845 times.
    Address difference -32 occurred 1816 times.
    Address difference -64 occurred 1812 times.

s_new:
    19 different address differences occurred. Top 5:
    Address difference 32 occurred 991911 times.
    Address difference 96 occurred 7825 times.
    Address difference -524192 occurred 117 times.
    Address difference 0 occurred 90 times.
    Address difference 64 occurred 37 times.

Code for that:

from collections import Counter

for s in 's_yes', 's_not', 's_new':
    print(s + ':')
    ids = list(map(id, eval(s)))
    ctr = Counter(j - i for i, j in zip(ids, ids[1:]))
    print('   ', len(ctr), 'different address differences occurred. Top 5:')
    for delta, count in ctr.most_common(5):
        print(f'    Address difference {delta} occurred {count} times.')
    print()

answered Oct 10 '22 06:10

no comment

Related questions
                            
                                Type hints when unpacking a tuple?
                            
                                Is it possible to modify lines in a file in-place?
                            
                                Locate first and last non NaN values in a Pandas DataFrame
                            
                                Importing correctly with pytest
                            
                                Custom Filter in Django Admin on Django 1.3 or below
                            
                                Python pickle protocol choice?
                            
                                Django setUpTestData() vs. setUp()
                            
                                How to print Docstring of python function from inside the function itself?
                            
                                Why is a trailing comma a SyntaxError in an argument list that uses *args syntax?
                            
                                What does Python's socket.recv() return for non-blocking sockets if no data is received until a timeout occurs?
                            
                                Why doesn't Pylint like built-in functions?
                            
                                Generating movie from python without saving individual frames to files
                            
                                How to print all variables values when debugging Python with pdb, without specifying each variable?
                            
                                Difference between Class and Instance methods
                            
                                Does virtualenv serve a purpose (in production) when using docker?
                            
                                What is the Python egg cache (PYTHON_EGG_CACHE)?
                            
                                importing izip from itertools module gives NameError in Python 3.x
                            
                                Which maximum does Python pick in the case of a tie?
                            
                                Is everything greater than None?
                            
                                Find length of longest string in Pandas dataframe column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With