I couldn't find any working Python 3.3 mergesort algorithm codes, so I made one myself. Is there any way to speed it up? It sorts 20,000 numbers in about 0.3-0.5 seconds <pre class="prettyprint"><code>def msort(x): result = [] if len(x) < 2: return x mid = int(len(x)/2) y = msort(x[:mid]) z = msort(x[mid:]) while (len(y) > 0) or (len(z) > 0): if len(y) > 0 and len(z) > 0: if y[0] > z[0]: result.append(z[0]) z.pop(0) else: result.append(y[0]) y.pop(0) elif len(z) > 0: for i in z: result.append(i) z.pop(0) else: for i in y: result.append(i) y.pop(0) return result </code></pre>

The first improvement would be to simplify the three cases in the main loop: Rather than iterating while some of the sequence has elements, iterate while both sequences have elements. When leaving the loop, one of them will be empty, we don't know which, but we don't care: We append them at the end of the result. <pre class="prettyprint"><code>def msort2(x): if len(x) < 2: return x result = [] # moved! mid = int(len(x) / 2) y = msort2(x[:mid]) z = msort2(x[mid:]) while (len(y) > 0) and (len(z) > 0): if y[0] > z[0]: result.append(z[0]) z.pop(0) else: result.append(y[0]) y.pop(0) result += y result += z return result </code></pre> The second optimization is to avoid <code>pop</code>ping the elements. Rather, have two indices: <pre class="prettyprint"><code>def msort3(x): if len(x) < 2: return x result = [] mid = int(len(x) / 2) y = msort3(x[:mid]) z = msort3(x[mid:]) i = 0 j = 0 while i < len(y) and j < len(z): if y[i] > z[j]: result.append(z[j]) j += 1 else: result.append(y[i]) i += 1 result += y[i:] result += z[j:] return result </code></pre> A final improvement consists in using a non recursive algorithm to sort short sequences. In this case I use the built-in <code>sorted</code> function and use it when the size of the input is less than 20: <pre class="prettyprint"><code>def msort4(x): if len(x) < 20: return sorted(x) result = [] mid = int(len(x) / 2) y = msort4(x[:mid]) z = msort4(x[mid:]) i = 0 j = 0 while i < len(y) and j < len(z): if y[i] > z[j]: result.append(z[j]) j += 1 else: result.append(y[i]) i += 1 result += y[i:] result += z[j:] return result </code></pre> My measurements to sort a random list of 100000 integers are 2.46 seconds for the original version, 2.33 for msort2, 0.60 for msort3 and 0.40 for msort4. For reference, sorting all the list with <code>sorted</code> takes 0.03 seconds.

Code from MIT course. (with generic cooperator ) <pre class="prettyprint"><code>import operator def merge(left, right, compare): result = [] i, j = 0, 0 while i < len(left) and j < len(right): if compare(left[i], right[j]): result.append(left[i]) i += 1 else: result.append(right[j]) j += 1 while i < len(left): result.append(left[i]) i += 1 while j < len(right): result.append(right[j]) j += 1 return result def mergeSort(L, compare=operator.lt): if len(L) < 2: return L[:] else: middle = int(len(L) / 2) left = mergeSort(L[:middle], compare) right = mergeSort(L[middle:], compare) return merge(left, right, compare) </code></pre>

Mergesort with Python

Tags:

python

algorithm

sorting

python-3.x

mergesort

I couldn't find any working Python 3.3 mergesort algorithm codes, so I made one myself. Is there any way to speed it up? It sorts 20,000 numbers in about 0.3-0.5 seconds

def msort(x):     result = []     if len(x) < 2:         return x     mid = int(len(x)/2)     y = msort(x[:mid])     z = msort(x[mid:])     while (len(y) > 0) or (len(z) > 0):         if len(y) > 0 and len(z) > 0:             if y[0] > z[0]:                 result.append(z[0])                 z.pop(0)             else:                 result.append(y[0])                 y.pop(0)         elif len(z) > 0:             for i in z:                 result.append(i)                 z.pop(0)         else:             for i in y:                 result.append(i)                 y.pop(0)     return result

569

asked Sep 12 '13 10:09

Hans

2 Answers

The first improvement would be to simplify the three cases in the main loop: Rather than iterating while some of the sequence has elements, iterate while both sequences have elements. When leaving the loop, one of them will be empty, we don't know which, but we don't care: We append them at the end of the result.

def msort2(x):     if len(x) < 2:         return x     result = []          # moved!     mid = int(len(x) / 2)     y = msort2(x[:mid])     z = msort2(x[mid:])     while (len(y) > 0) and (len(z) > 0):         if y[0] > z[0]:             result.append(z[0])             z.pop(0)         else:             result.append(y[0])             y.pop(0)     result += y     result += z     return result

The second optimization is to avoid popping the elements. Rather, have two indices:

def msort3(x):     if len(x) < 2:         return x     result = []     mid = int(len(x) / 2)     y = msort3(x[:mid])     z = msort3(x[mid:])     i = 0     j = 0     while i < len(y) and j < len(z):         if y[i] > z[j]:             result.append(z[j])             j += 1         else:             result.append(y[i])             i += 1     result += y[i:]     result += z[j:]     return result

A final improvement consists in using a non recursive algorithm to sort short sequences. In this case I use the built-in sorted function and use it when the size of the input is less than 20:

def msort4(x):     if len(x) < 20:         return sorted(x)     result = []     mid = int(len(x) / 2)     y = msort4(x[:mid])     z = msort4(x[mid:])     i = 0     j = 0     while i < len(y) and j < len(z):         if y[i] > z[j]:             result.append(z[j])             j += 1         else:             result.append(y[i])             i += 1     result += y[i:]     result += z[j:]     return result

My measurements to sort a random list of 100000 integers are 2.46 seconds for the original version, 2.33 for msort2, 0.60 for msort3 and 0.40 for msort4. For reference, sorting all the list with sorted takes 0.03 seconds.

120

answered Sep 21 '22 18:09

anumi

Code from MIT course. (with generic cooperator )

import operator   def merge(left, right, compare):     result = []     i, j = 0, 0     while i < len(left) and j < len(right):         if compare(left[i], right[j]):             result.append(left[i])             i += 1         else:             result.append(right[j])             j += 1     while i < len(left):         result.append(left[i])         i += 1     while j < len(right):         result.append(right[j])         j += 1     return result   def mergeSort(L, compare=operator.lt):     if len(L) < 2:         return L[:]     else:         middle = int(len(L) / 2)         left = mergeSort(L[:middle], compare)         right = mergeSort(L[middle:], compare)         return merge(left, right, compare)

answered Sep 19 '22 18:09

David Yachnis

Related questions
                            
                                How to extract multiple JSON objects from one file?
                            
                                How to generate SSH key pairs with Python
                            
                                How to plot one single data point?
                            
                                Python 3.5.1 urllib has no attribute request
                            
                                Python Pandas update a dataframe value from another dataframe
                            
                                What is the cleanest way to do HTTP POST with basic auth in Python?
                            
                                Python faster than compiled Haskell?
                            
                                Print a list of space-separated elements
                            
                                Break the nested (double) loop in Python [duplicate]
                            
                                Python: ulimit and nice for subprocess.call / subprocess.Popen?
                            
                                Tensorflow - matmul of input matrix with batch data
                            
                                Python readline() from a string?
                            
                                Pandas finding local max and min
                            
                                Pycharm: set environment variable for run manage.py Task
                            
                                How to test if a given time-stamp is in seconds or milliseconds?
                            
                                How do I return an image in fastAPI?
                            
                                How do I remove all zero elements from a NumPy array?
                            
                                How can I set the x-axis as datetimes on a bokeh plot?
                            
                                permutations of two lists in python
                            
                                OpenAI Gym Atari on Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With