Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is processing a sorted array not faster than an unsorted array in Python?

In this post Why is processing a sorted array faster than random array, it says that branch predicton is the reason of the performance boost in sorted arrays.

But I just tried the example using Python; and I think there is no difference between sorted and random arrays (I tried both bytearray and array; and use line_profile to profile the computation).

Am I missing something?

Here is my code:

from array import array
import random
array_size = 1024
loop_cnt = 1000
# I also tried 'array', and it's almost the same
a = bytearray(array_size)
for i in xrange(array_size):
    a.append(random.randint(0, 255))
#sorted                                                                         
a = sorted(a)
@profile
def computation():
    sum = 0
    for i in xrange(loop_cnt):
        for j in xrange(size):
            if a[j] >= 128:
                sum += a[j]

computation()
print 'done'
like image 200
ming.kernel Avatar asked Oct 11 '12 15:10

ming.kernel


People also ask

Why is processing a sorted array faster than processing an unsorted array?

In C++, it is faster to process a sorted array than an unsorted array because of branch prediction. In computer architecture, a branch prediction determines whether a conditional branch (jump) in the instruction flow of a program is likely to be taken or not. Branch prediction doesn't play a significant role here.

What is the difference between sorted array and unsorted array?

In short, searching in an unsorted array takes O(n) time: you potentially have to look at every item to find out if what you're looking for is there. A sorted array lets you speed up the search. Instead of having to examine every item, you only have to examine at most log2(n) items.

Why is an ordered array better than an unordered array?

What is the tradeoff between using an unordered array versus an ordered array ? The major advantage of an ordered array is that the search times have time complexity of O(log n), compared to that of an unordered array, which is O (n).

What are the advantages of sorted array?

Sorted arrays are the most space-efficient data structure with the best locality of reference for sequentially stored data.


1 Answers

I may be wrong, but I see a fundamental difference between the linked question and your example: Python interprets bytecode, C++ compiles to native code.

In the C++ code that if translates directly to a cmp/jl sequence, that can be considered by the CPU branch predictor as a single "prediction spot", specific to that cycle.

In Python that comparison is actually several function calls, so there's (1) more overhead and (2) I suppose the code that performs that comparison is a function into the interpreter used for every other integer comparison - so it's a "prediction spot" not specific to the current block, which gives the branch predictor a much harder time to guess correctly.


Edit: also, as outlined in this paper, there are way more indirect branches inside an interpreter, so such an optimization in your Python code would probably be buried anyway by the branch mispredictions in the interpreter itself.

like image 195
Matteo Italia Avatar answered Oct 04 '22 09:10

Matteo Italia