I was messing around with Python trying to practice my sorting algorithms and found out something interesting.
I have three different pieces of data:
When:
x = 100000 and
y = (0,100000) then
z = 0.94182094911 sec
When:
x = 100000 and
y = (0,100) then
z = 12.4218382537 sec
When:
x = 100000 and
y = (0,10) then
z = 110.267447809 sec
Any ideas?
Code:
import time
import random
import sys
#-----Function definitions
def quickSort(array): #random pivot location quicksort. uses extra memory.
smaller = []
greater = []
if len(array) <= 1:
return array
pivotVal = array[random.randint(0, len(array)-1)]
array.remove(pivotVal)
for items in array:
if items <= pivotVal:
smaller.append(items)
else:
greater.append(items)
return concat(quickSort(smaller), pivotVal, quickSort(greater))
def concat(before, pivot, after):
new = []
for items in before:
new.append(items)
new.append(pivot)
for things in after:
new.append(things)
return new
#-----Variable definitions
list = []
iter = 0
sys.setrecursionlimit(20000)
start = time.clock() #start the clock
#-----Generate the list of numbers to sort
while(iter < 100000):
list.append(random.randint(0,10)) #modify this to change sorting speed
iter = iter + 1
timetogenerate = time.clock() - start #current timer - last timer snapshot
#-----Sort the list of numbers
list = quickSort(list)
timetosort = time.clock() - timetogenerate #current timer - last timer snapshot
#-----Write the list of numbers
file = open("C:\output.txt", 'w')
for items in list:
file.write(str(items))
file.write("\n")
file.close()
timetowrite = time.clock() - timetosort #current timer - last timer snapshot
#-----Print info
print "time to start: " + str(start)
print "time to generate: " + str(timetogenerate)
print "time to sort: " + str(timetosort)
print "time to write: " + str(timetowrite)
totaltime = timetogenerate + timetosort + start
print "total time: " + str(totaltime)
-------------------revised NEW code----------------------------
def quickSort(array): #random pivot location quicksort. uses extra memory.
smaller = []
greater = []
equal = []
if len(array) <= 1:
return array
pivotVal = array[random.randint(0, len(array)-1)]
array.remove(pivotVal)
equal.append(pivotVal)
for items in array:
if items < pivotVal:
smaller.append(items)
elif items > pivotVal:
greater.append(items)
else:
equal.append(items)
return concat(quickSort(smaller), equal, quickSort(greater))
def concat(before, equal, after):
new = []
for items in before:
new.append(items)
for items in equal:
new.append(items)
for items in after:
new.append(items)
return new
Merge sort can work well on any type of data sets irrespective of its size (either large or small). whereas The quick sort cannot work well with large datasets.
2 Answers. Show activity on this post. "We all know quicksort runs worst with a sorted list than an unsorted one": this is an audacious statement.
Quick sort is more efficient and works faster than merge sort in case of smaller array size or datasets. Sorting method : The quick sort is internal sorting method where the data is sorted in main memory.
Typically, quicksort is significantly faster in practice than other O(nlogn) algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data, it is possible to make design choices that minimize the probability of requiring quadratic time.
I think this has to do with the choice of a pivot. Depending on how your partition step works, if you have a lot of duplicate values, your algorithm can degenerate to quadratic behavior when confronted with many duplicates. For example, suppose that you're trying to quicksort this stream:
[0 0 0 0 0 0 0 0 0 0 0 0 0]
If you aren't careful with how you do the partitioning step, this can degenerate quickly. For example, suppose you pick your pivot as the first 0, leaving you with the array
[0 0 0 0 0 0 0 0 0 0 0 0]
to partition. Your algorithm might say that the smaller values are the array
[0 0 0 0 0 0 0 0 0 0 0 0]
And the larger values are the array
[]
This is the case that causes quicksort to degenerate to O(n2), since each recursive call is only shrinking the size of the input by one (namely, by pulling off the pivot element).
I noticed that in your code, your partitioning step does indeed do this:
for items in array:
if items <= pivotVal:
smaller.append(items)
else:
greater.append(items)
Given a stream that's a whole bunch of copies of the same element, this will put all of them into one array to recursively sort.
Of course, this seems like a ridiculous case - how is this at all connected to reducing the number of values in the array? - but it actually does come up when you're sorting lots of elements that aren't distinct. In particular, after a few passes of the partitioning, you're likely to group together all equal elements, which will bring you into this case.
For a discussion of how to prevent this from happening, there's a really great talk by Bob Sedgewick and Jon Bentley about how to modify the partition step to work quickly when in the presence of duplicate elements. It's connected to Dijkstra's Dutch national flag problem, and their solutions are really clever.
One option that works is to partition the input into three groups - less, equal, and greater. Once you've broken the input up this way, you only need to sort the less and greater groups; the equal groups are already sorted. The above link to the talk shows how to do this more or less in-place, but since you're already using an out-of-place quicksort the fix should be easy. Here's my attempt at it:
for items in array:
if items < pivotVal:
smaller.append(items)
elif items == pivotVal:
equal.append(items)
else:
greater.append(items)
I've never written a line of Python in my life, BTW, so this may be totally illegal syntax. But I hope the idea is clear! :-)
Things we know:
O(n*logn)
.O(n^2)
.O(n^2)
, and reversely as we shuffle it the complexity approaches O(n*logn)
Now, let's look at your experiment:
n
which you named x
is always 100000.O(n*logn)
complexity case.O(n^2)
.And with the same large enough n
you can say that n*logn > n^2
, which you actually confirmed by your experiment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With