Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python QuickSort maximum recursion depth

Tags:

python

sorting

(Python 2.7.8 Windows)

I'm doing a comparison between different sorting algorithms (Quick, bubble and insertion), and mostly it's going as expected, Quick sort is considerably faster with long lists and bubble and insertion are faster with very short lists and alredy sorted ones.

What's raising a problem is Quick Sort and the before mentioned "already sorted" lists. I can sort lists of even 100000 items without problems with this, but with lists of integers from 0...n the limit seems to be considerably lower. 0...500 works but even 0...1000 gives:

RuntimeError: maximum recursion depth exceeded in cmp

Quick Sort:

def quickSort(myList):
    if myList == []:
        return []
    else:
        pivot = myList[0]
        lesser = quickSort([x for x in myList[1:] if x < pivot])
        greater = quickSort([x for x in myList[1:] if x >= pivot])
        myList = lesser + [pivot] + greater
        return myList

Is there something wrong with the code, or am I missing something?

like image 456
questions989 Avatar asked Nov 24 '14 23:11

questions989


People also ask

What is the maximum depth recursion limit in Python?

Python uses a maximum recursion depth of 1000 to ensure no stack overflow errors and infinite recursions are possible.

How do you overcome maximum recursion depth in Python?

The Python interpreter limits the recursion limit so that infinite recursions are avoided. The “sys” module in Python provides a function called setrecursionlimit() to modify the recursion limit in Python. It takes one parameter, the value of the new recursion limit.

What is the recursion depth of QuickSort?

Define the recursion depth of QuickSort to be the maximum number of successive recursive calls before it hits the base case — equivalently, the number of the last level of the corresponding recursion tree. Note that the recursion depth is a random variable, which depends on which pivots get chosen.

How do you solve maximum recursion depth exceeded?

The “maximum recursion depth exceeded in comparison” error is raised when you try to execute a function that exceeds Python's built in recursion limit. You can fix this error by rewriting your program to use an iterative approach or by increasing the recursion limit in Python.


2 Answers

There are two things going on.

First, Python intentionally limits recursion to a fixed depth. Unlike, say, Scheme, which will keep allocating frames for recursive calls until you run out of memory, Python (at least the most popular implementation, CPython) will only allocate sys.getrecursionlimit() frames (defaulting to 1000) before failing. There are reasons for that,* but really, that isn't relevant here; just the fact that it does this is what you need to know about.

Second, as you may already know, while QuickSort is O(N log N) with most lists, it has a worst case of O(N^2)—in particular (using the standard pivot rules) with already-sorted lists. And when this happens, your stack depth can end up being O(N). So, if you have 1000 elements, arranged in worst-case order, and you're already one frame into the stack, you're going to overflow.

You can work around this in a few ways:

  • Rewrite the code to be iterative, with an explicit stack, so you're only limited by heap memory instead of stack depth.
  • Make sure to always recurse into the shorter side first, rather than the left side. This means that even in the O(N^2) case, your stack depth is still O(log N). But only if you've already done the previous step.**
  • Use a random, median-of-three, or other pivot rule that makes common cases not like already-sorted worst-case. (Of course someone can still intentionally DoS your code; there's really no way to avoid that with quicksort.) The Wikipedia article has some discussion on this, and links to the classic Sedgewick and Knuth papers.
  • Use a Python implementation with an unlimited stack.***
  • sys.setrecursionlimit(max(sys.getrecursionlimit(), len(myList)+CONSTANT)). This way, you'll fail right off the bat for an obvious reason if you can't make enough space, and usually won't fail otherwise. (But you might—you could be starting the sort already 900 steps deep in the stack…) But this is a bad idea.****. Besides, you have to figure out the right CONSTANT, which is impossible in general.*****

* Historically, the CPython interpreter recursively calls itself for recursive Python function calls. And the C stack is fixed in size; if you overrun the end, you could segfault, stomp all over heap memory, or all kinds of other problems. This could be changed—in fact, Stackless Python started off as basically just CPython with this change. But the core devs have intentionally chosen not to do so, in part because they don't want to encourage people to write deeply recursive code.

** Or if your language does automatic tail call elimination, but Python doesn't do that. But, as gnibbler points out, you can write a hybrid solution—recurse on the small end, then manually unwrap the tail recursion on the large end—that won't require an explicit stack.

*** Stackless and PyPy can both be configured this way.

**** For one thing, eventually you're going to crash the C stack.

***** The constant isn't really constant; it depends on how deep you already are in the stack (computable non-portably by walking sys._getframe() up to the top) and how much slack you need for comparison functions, etc. (not computable at all, you just have to guess).

like image 84
abarnert Avatar answered Sep 18 '22 21:09

abarnert


You're choosing the first item of each sublist as the pivot. If the list is already in order, this means that your greater sublist is all the items but the first, rather than about half of them. Essentially, each recursive call manages to process only one item. Which means the depth of recursive calls you'll need to make will be about the same as the number of items in the full list. Which overflows Python's built-in limit once you hit about 1000 items. You will have a similar problem sorting lists that are already in reversed order.

To correct this use one of the workarounds suggested in the literature, such as choosing an item at random to be the pivot or the median of the first, middle, and last items.

like image 36
kindall Avatar answered Sep 21 '22 21:09

kindall