Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python quicksort - List comprehension vs Recursion (partition routine)

I watched the talk Three Beautiful Quicksorts and was messing around with quicksort. My implementation in python was very similar to c (select pivot, partition around it and recursing over smaller and larger partitions). Which I thought wasn't pythonic.

So this is the implementation using list comprehension in python.

def qsort(list):
    if list == []: 
        return []
    pivot = list[0]
    l = qsort([x for x in list[1:] if x < pivot])
    u = qsort([x for x in list[1:] if x >= pivot])
    return l + [pivot] + u

Lets call the recursion metho qsortR. now I noticed that qsortR runs much slower than qsort for large(r) lists. Actually "maximum recursion depth exceeded in cmp" even for 1000 elems for recursion method. Which I reset in sys.setrecursionlimit.

Some numbers:

list-compr 1000 elems 0.491770029068
recursion 1000 elems 2.24620914459
list-compr 2000 elems 0.992327928543
recursion 2000 elems 7.72630095482

All the code is here.

I have a couple of questions:

  • Why is list comprehension so much faster?
  • Some enlightenment on the limit on recursion in python. I first set it to 100000 in what cases should I be careful?
    • (What exactly is meant by 'optimizing tail recursion', how is it done?)
  • Trying to sort 1000000 elements hogged memory of my laptop (with the recursion method). What should I do if I want to sort so many elements? What kind of optimizations are possible?
like image 435
Swair Avatar asked Aug 24 '12 10:08

Swair


3 Answers

  1. Why is list comprehension so much faster?

    Because list comprehension implies C loop which is much faster than slow general way of using Python's for block.

  2. Some enlightenment on the limit on recursion in python. I first set it to 100000 in what cases should I be careful?

    In case you run out of memory.

  3. Trying to sort 1000000 elements hogged memory of my laptop (with the recursion method). What should I do if I want to sort so many elements? What kind of optimizations are possible?

    Python's recursion gives such an overhead because every function call allocates a lot of stack memory space on each call.

    In general, iteration is the answer (will give better performance in statistically 99% of use cases).

    Talking about memory structures, if you have simple data structures, like chars, integers, floats: use built-in array.array which is much more memory efficient than a list.

like image 190
Rostyslav Dzinko Avatar answered Nov 11 '22 12:11

Rostyslav Dzinko


Have you tried writing a non-recursive implementation of partition? I suspect that the performance difference is purely the partition implementation. You are recursing for each element in your implementation.

Update

Here's a quick implementation. It is still not super fast or even efficient, but it is much better than your original recursive one.

>>> def partition(data):
...  pivot = data[0]
...  less, equal, greater = [], [], []
...  for elm in data:
...   if elm < pivot:
...    less.append(elm)
...   elif elm > pivot:
...    greater.append(elm)
...   else:
...    equal.append(elm)
...  return less, equal, greater
...
>>> def qsort2(data):
...  if data:
...   less, equal, greater = partition(data)
...   return qsort2(less) + equal + qsort2(greater)
...  return data
... 

I also think that there are a larger number of temporary lists generated in the "traditional" version.

like image 39
D.Shawley Avatar answered Nov 11 '22 11:11

D.Shawley


Try to compare the list comprehension to an in-place algorithm when the memory goes really big. The code below get a near execution time when sorting 100K integers numbers, but you will probably get stucked in the list comprehension solution when sorting 1M integers. I've made the tests using a 4Gb machine. The full code: http://snipt.org/Aaaje2

class QSort:
def __init__(self, lst):
    self.lst = lst

def sorted(self):
    self.qsort_swap(0, len(self.lst))
    return self.lst

def qsort_swap(self, begin, end):
    if (end - begin) > 1:
       pivot = self.lst[begin]
       l = begin + 1
       r = end
       while l < r:
           if self.lst[l] <= pivot:
               l += 1
           else:
               r -= 1
               self.lst[l], self.lst[r] = self.lst[r], self.lst[l]

       l -= 1
       self.lst[begin], self.lst[l] = self.lst[l], self.lst[begin]    
       # print begin, end, self.lst
       self.qsort_swap(begin, l)
       self.qsort_swap(r, end)     
like image 23
olivecoder Avatar answered Nov 11 '22 11:11

olivecoder