Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quicksort sorts larger numbers faster?

I was messing around with Python trying to practice my sorting algorithms and found out something interesting.

I have three different pieces of data:

  • x = number of numbers to sort
  • y = range the numbers are in (all random generated ints)
  • z = total time taken to sort

When:
x = 100000 and
y = (0,100000) then
z = 0.94182094911 sec

When:
x = 100000 and
y = (0,100) then
z = 12.4218382537 sec

When:
x = 100000 and
y = (0,10) then
z = 110.267447809 sec

Any ideas?

Code:

import time
import random
import sys

#-----Function definitions

def quickSort(array): #random pivot location quicksort. uses extra memory.
    smaller = []
    greater = []
    if len(array) <= 1:
        return array
    pivotVal = array[random.randint(0, len(array)-1)]
    array.remove(pivotVal)
    for items in array:
        if items <= pivotVal:
            smaller.append(items)
        else:
            greater.append(items)
    return concat(quickSort(smaller), pivotVal, quickSort(greater))

def concat(before, pivot, after):
    new = []
    for items in before:
        new.append(items)
    new.append(pivot)
    for things in after:
        new.append(things)
    return new

#-----Variable definitions
list = []
iter = 0
sys.setrecursionlimit(20000)
start = time.clock() #start the clock

#-----Generate the list of numbers to sort
while(iter < 100000):
    list.append(random.randint(0,10))  #modify this to change sorting speed
    iter = iter + 1
timetogenerate = time.clock() - start #current timer - last timer snapshot

#-----Sort the list of numbers
list = quickSort(list)
timetosort = time.clock() - timetogenerate #current timer - last timer snapshot

#-----Write the list of numbers
file = open("C:\output.txt", 'w')
for items in list:
    file.write(str(items))
    file.write("\n")
file.close()
timetowrite = time.clock() - timetosort #current timer - last timer snapshot

#-----Print info
print "time to start: " + str(start)
print "time to generate: " + str(timetogenerate)
print "time to sort: " + str(timetosort)
print "time to write: " + str(timetowrite)
totaltime = timetogenerate + timetosort + start
print "total time: " + str(totaltime)

-------------------revised NEW code----------------------------

def quickSort(array): #random pivot location quicksort. uses extra memory.
    smaller = []
    greater = []
    equal = []
    if len(array) <= 1:
        return array
    pivotVal = array[random.randint(0, len(array)-1)]
    array.remove(pivotVal)
    equal.append(pivotVal)
    for items in array:
        if items < pivotVal:
            smaller.append(items)
        elif items > pivotVal:
            greater.append(items)
        else:
            equal.append(items)
    return concat(quickSort(smaller), equal, quickSort(greater))

def concat(before, equal, after):
    new = []
    for items in before:
        new.append(items)
    for items in equal:
        new.append(items)
    for items in after:
        new.append(items)
    return new
like image 429
fIwJlxSzApHEZIl Avatar asked Feb 10 '11 23:02

fIwJlxSzApHEZIl


People also ask

Is quick sort good for large data?

Merge sort can work well on any type of data sets irrespective of its size (either large or small). whereas The quick sort cannot work well with large datasets.

Does quicksort run faster on a sorted list?

2 Answers. Show activity on this post. "We all know quicksort runs worst with a sorted list than an unsorted one": this is an audacious statement.

Is quicksort better for small or large arrays?

Quick sort is more efficient and works faster than merge sort in case of smaller array size or datasets. Sorting method : The quick sort is internal sorting method where the data is sorted in main memory.

What makes quick sort fast?

Typically, quicksort is significantly faster in practice than other O(nlogn) algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data, it is possible to make design choices that minimize the probability of requiring quadratic time.


2 Answers

I think this has to do with the choice of a pivot. Depending on how your partition step works, if you have a lot of duplicate values, your algorithm can degenerate to quadratic behavior when confronted with many duplicates. For example, suppose that you're trying to quicksort this stream:

 [0 0 0 0 0 0 0 0 0 0 0 0 0]

If you aren't careful with how you do the partitioning step, this can degenerate quickly. For example, suppose you pick your pivot as the first 0, leaving you with the array

 [0 0 0 0 0 0 0 0 0 0 0 0]

to partition. Your algorithm might say that the smaller values are the array

 [0 0 0 0 0 0 0 0 0 0 0 0]

And the larger values are the array

 []

This is the case that causes quicksort to degenerate to O(n2), since each recursive call is only shrinking the size of the input by one (namely, by pulling off the pivot element).

I noticed that in your code, your partitioning step does indeed do this:

for items in array:
    if items <= pivotVal:
        smaller.append(items)
    else:
        greater.append(items)

Given a stream that's a whole bunch of copies of the same element, this will put all of them into one array to recursively sort.

Of course, this seems like a ridiculous case - how is this at all connected to reducing the number of values in the array? - but it actually does come up when you're sorting lots of elements that aren't distinct. In particular, after a few passes of the partitioning, you're likely to group together all equal elements, which will bring you into this case.

For a discussion of how to prevent this from happening, there's a really great talk by Bob Sedgewick and Jon Bentley about how to modify the partition step to work quickly when in the presence of duplicate elements. It's connected to Dijkstra's Dutch national flag problem, and their solutions are really clever.

One option that works is to partition the input into three groups - less, equal, and greater. Once you've broken the input up this way, you only need to sort the less and greater groups; the equal groups are already sorted. The above link to the talk shows how to do this more or less in-place, but since you're already using an out-of-place quicksort the fix should be easy. Here's my attempt at it:

for items in array:
    if items < pivotVal:
        smaller.append(items)
    elif items == pivotVal:
        equal.append(items)
    else:
        greater.append(items)

I've never written a line of Python in my life, BTW, so this may be totally illegal syntax. But I hope the idea is clear! :-)

like image 112
templatetypedef Avatar answered Oct 19 '22 07:10

templatetypedef


Things we know:

  1. Time complexity for quick sort of unordered array is O(n*logn).
  2. If the array is already sorted, it degrades to O(n^2).
  3. First two statements are not discrete, i.e. the closer an array is to being sorted, the closer is time complexity of quick sort to O(n^2), and reversely as we shuffle it the complexity approaches O(n*logn)

Now, let's look at your experiment:

  • In all three cases you used the same number of elements. So, our n which you named x is always 100000.
  • In your first experiment, you used numbers between 0 and 100000, so ideally with a perfect random number generator you'd get mostly different numbers in a relatively unordered list, thus fitting the O(n*logn) complexity case.
  • In your third experiment, you used numbers between 0 an 10 in a 100000 elements large list. It means that there were quite many duplicates in your list, making it a lot closer to a sorted list than in the first experiment. So, in that case time complexity was much closer to O(n^2).

And with the same large enough n you can say that n*logn > n^2, which you actually confirmed by your experiment.

like image 26
Goran Jovic Avatar answered Oct 19 '22 07:10

Goran Jovic