Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the JDK implementation of quicksort risk a stack overflow?

I saw a presentation the other day in which the speaker had used the technique outlined in McIlroy's paper A Killer Adversary for Quicksort to generate an input to Arrays.sort for primitive types that would trigger O(n2) behavior. The sequence caused the pivot choice to always only reduce the array size by a constant, which caused the Java Arrays.sort function to cause a stack overflow.

According to the source files from the JDK, the Arrays.sort1 quicksort implementation function has no guarding to prevent stack overflows. It is always possible to make quicksort never stack overflow by having the sorting routine not fire off two recursive calls, but instead use a while loop to reuse the current stack frame for the larger subarray and only recursing once (on the smaller subarray). This causes minimal performance degradation and makes it impossible to cause a stack overflow for any reasonably-sized input, since the stack depth never exceeds O(log n) stack frames on an input of size n. The authors also could have used the introsort algorithm, which modifies quicksort to switch to a worst-case O(n log n) sorting algorithm when the quicksort recursion depth exceeds some limit, to prevent this.

Is there any reason why the authors of Arrays.sort didn't opt to do this? It seems like a serious problem that a built-in sorting algorithm can cause a stack overflow, as it makes it possible to launch a DoS attack against such a system by triggering repeated stack overflows.

like image 772
templatetypedef Avatar asked May 10 '13 23:05

templatetypedef


1 Answers

Why? Because solving the problem would be overkill.

The algorithm used will be stable in all but exceptionally unusual circumstances and if those circumstances are more than usually likely to occurr then the situation will be guarded against externally. That is why they have API documentation that defines the algorithm used behind the scenes. So you can defend against it.

The chances of the specific order that breaks the algorithm being presented is vanishingly small.

I expect if you looked carefully enough there would be datasets that cause almost all of the standard JVM structures to break. What is the cost of protecting against them and is that cost worth the effort and the inevitable degradation of the algorithm due to the defensive measures.

like image 112
OldCurmudgeon Avatar answered Sep 25 '22 03:09

OldCurmudgeon