Quicksort sorts larger numbers faster?

Tags:

I was messing around with Python trying to practice my sorting algorithms and found out something interesting.

I have three different pieces of data:

x = number of numbers to sort
y = range the numbers are in (all random generated ints)
z = total time taken to sort

When:
x = 100000 and
y = (0,100000) then
z = 0.94182094911 sec

When:
x = 100000 and
y = (0,100) then
z = 12.4218382537 sec

When:
x = 100000 and
y = (0,10) then
z = 110.267447809 sec

Any ideas?

Code:

import time
import random
import sys

#-----Function definitions

def quickSort(array): #random pivot location quicksort. uses extra memory.
    smaller = []
    greater = []
    if len(array) <= 1:
        return array
    pivotVal = array[random.randint(0, len(array)-1)]
    array.remove(pivotVal)
    for items in array:
        if items <= pivotVal:
            smaller.append(items)
        else:
            greater.append(items)
    return concat(quickSort(smaller), pivotVal, quickSort(greater))

def concat(before, pivot, after):
    new = []
    for items in before:
        new.append(items)
    new.append(pivot)
    for things in after:
        new.append(things)
    return new

#-----Variable definitions
list = []
iter = 0
sys.setrecursionlimit(20000)
start = time.clock() #start the clock

#-----Generate the list of numbers to sort
while(iter < 100000):
    list.append(random.randint(0,10))  #modify this to change sorting speed
    iter = iter + 1
timetogenerate = time.clock() - start #current timer - last timer snapshot

#-----Sort the list of numbers
list = quickSort(list)
timetosort = time.clock() - timetogenerate #current timer - last timer snapshot

#-----Write the list of numbers
file = open("C:\output.txt", 'w')
for items in list:
    file.write(str(items))
    file.write("\n")
file.close()
timetowrite = time.clock() - timetosort #current timer - last timer snapshot

#-----Print info
print "time to start: " + str(start)
print "time to generate: " + str(timetogenerate)
print "time to sort: " + str(timetosort)
print "time to write: " + str(timetowrite)
totaltime = timetogenerate + timetosort + start
print "total time: " + str(totaltime)

-------------------revised NEW code----------------------------

def quickSort(array): #random pivot location quicksort. uses extra memory.
    smaller = []
    greater = []
    equal = []
    if len(array) <= 1:
        return array
    pivotVal = array[random.randint(0, len(array)-1)]
    array.remove(pivotVal)
    equal.append(pivotVal)
    for items in array:
        if items < pivotVal:
            smaller.append(items)
        elif items > pivotVal:
            greater.append(items)
        else:
            equal.append(items)
    return concat(quickSort(smaller), equal, quickSort(greater))

def concat(before, equal, after):
    new = []
    for items in before:
        new.append(items)
    for items in equal:
        new.append(items)
    for items in after:
        new.append(items)
    return new

429

asked Feb 10 '11 23:02

fIwJlxSzApHEZIl

2 Answers

I think this has to do with the choice of a pivot. Depending on how your partition step works, if you have a lot of duplicate values, your algorithm can degenerate to quadratic behavior when confronted with many duplicates. For example, suppose that you're trying to quicksort this stream:

 [0 0 0 0 0 0 0 0 0 0 0 0 0]

If you aren't careful with how you do the partitioning step, this can degenerate quickly. For example, suppose you pick your pivot as the first 0, leaving you with the array

 [0 0 0 0 0 0 0 0 0 0 0 0]

to partition. Your algorithm might say that the smaller values are the array

 [0 0 0 0 0 0 0 0 0 0 0 0]

And the larger values are the array

[]

This is the case that causes quicksort to degenerate to O(n²), since each recursive call is only shrinking the size of the input by one (namely, by pulling off the pivot element).

I noticed that in your code, your partitioning step does indeed do this:

for items in array:
    if items <= pivotVal:
        smaller.append(items)
    else:
        greater.append(items)

Given a stream that's a whole bunch of copies of the same element, this will put all of them into one array to recursively sort.

Of course, this seems like a ridiculous case - how is this at all connected to reducing the number of values in the array? - but it actually does come up when you're sorting lots of elements that aren't distinct. In particular, after a few passes of the partitioning, you're likely to group together all equal elements, which will bring you into this case.

For a discussion of how to prevent this from happening, there's a really great talk by Bob Sedgewick and Jon Bentley about how to modify the partition step to work quickly when in the presence of duplicate elements. It's connected to Dijkstra's Dutch national flag problem, and their solutions are really clever.

One option that works is to partition the input into three groups - less, equal, and greater. Once you've broken the input up this way, you only need to sort the less and greater groups; the equal groups are already sorted. The above link to the talk shows how to do this more or less in-place, but since you're already using an out-of-place quicksort the fix should be easy. Here's my attempt at it:

for items in array:
    if items < pivotVal:
        smaller.append(items)
    elif items == pivotVal:
        equal.append(items)
    else:
        greater.append(items)

I've never written a line of Python in my life, BTW, so this may be totally illegal syntax. But I hope the idea is clear! :-)

112

answered Oct 19 '22 07:10

templatetypedef

Things we know:

Time complexity for quick sort of unordered array is O(n*logn).
If the array is already sorted, it degrades to O(n^2).
First two statements are not discrete, i.e. the closer an array is to being sorted, the closer is time complexity of quick sort to O(n^2), and reversely as we shuffle it the complexity approaches O(n*logn)

Now, let's look at your experiment:

In all three cases you used the same number of elements. So, our n which you named x is always 100000.
In your first experiment, you used numbers between 0 and 100000, so ideally with a perfect random number generator you'd get mostly different numbers in a relatively unordered list, thus fitting the O(n*logn) complexity case.
In your third experiment, you used numbers between 0 an 10 in a 100000 elements large list. It means that there were quite many duplicates in your list, making it a lot closer to a sorted list than in the first experiment. So, in that case time complexity was much closer to O(n^2).

And with the same large enough n you can say that n*logn > n^2, which you actually confirmed by your experiment.

answered Oct 19 '22 07:10

Goran Jovic

Related questions
                            
                                How do I use Psycopg2's LoggingConnection?
                            
                                Using foreign keys in sqlite3 for Python
                            
                                How to add a favicon to a Pelican blog?
                            
                                How to load an existing ipython notebook?
                            
                                Running python script in Laravel
                            
                                How to verify SqlAlchemy engine object
                            
                                python dictionary: How to get all keys with specific values
                            
                                Pyspark - converting json string to DataFrame
                            
                                How to create a normal distribution in pytorch
                            
                                ArrayField in serializer
                            
                                Jupyter Notebook - Plot inside a function - Figure is not plotted [duplicate]
                            
                                Difference between Enum and IntEnum in Python
                            
                                ImportError: cannot import name 'PackageFinder'
                            
                                Python poetry - how to install optional dependencies?
                            
                                Error Using Drive Mount with Google Colab
                            
                                Gunicorn ImportError: cannot import name 'ALREADY_HANDLED' from 'eventlet.wsgi' in docker
                            
                                Pythonic Way to Initialize (Complex) Static Data Members
                            
                                Interface for modifying Windows environment variables from Python
                            
                                Include a text file *as is* in (Python) Sphinx Docs
                            
                                Call Perl script from Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Quicksort sorts larger numbers faster?

Tags:

performance

python

algorithm

sorting

quicksort

fIwJlxSzApHEZIl

People also ask

2 Answers

templatetypedef

Goran Jovic

Recent Activity

Donate For Us