Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparison between two different merge sort implementations explained

I wrote my implementation like this:

def merge_sort(lst):
    if len(lst) < 2: return lst

    midx = len(lst) // 2

    return merge(
        merge_sort(lst[:midx]),
        merge_sort(lst[midx:])
    )

def merge(left, right):
    merged_lst = []
    while left and right:
        merged_lst.append((left.pop(0) if left[0] <= right[0] else right.pop(0)))

    merged_lst.extend(left if left else right)
    return merged_lst

Then I found this one online:

def mergeSort(alist):
    if len(alist)>1:
        mid = len(alist)//2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]

        mergeSort(lefthalf)
        mergeSort(righthalf)

        i=0
        j=0
        k=0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] <= righthalf[j]:
                alist[k]=lefthalf[i]
                i=i+1
            else:
                alist[k]=righthalf[j]
                j=j+1
            k=k+1

        while i < len(lefthalf):
            alist[k]=lefthalf[i]
            i=i+1
            k=k+1

        while j < len(righthalf):
            alist[k]=righthalf[j]
            j=j+1
            k=k+1

I thought mine would be faster, but use up more memory. Then I profiled both for med_lst and big_lst:

med_lst = [random.random() for _ in range(100000)]
big_lst = [random.random() for _ in range(500000)]

Using code structured like this in Jupyter notebook:

printmd(f'RAM at start: {memory_profiler.memory_usage()[0]:0.1f}MiB', color='blue')
t1 = time.time()
printmd(f'Sorting: {len(big_lst)} elements', color="blue")
merge_sort(big_lst)
t2 = time.time()
printmd(f'RAM after sorting list: {memory_profiler.memory_usage()[0]:0.1f}MiB, took {t2 - t1:0.1f}s', color='blue')

The time complexities surprised me:

Medium List: merge_sort: 1.2s mergeSort: 0.6s

Big List: merge_sort: 29.0s mergeSort: 3.4s

Is this meant to be just because of creating a new list, and maybe collisions or resizing? Or is my implementation / thought process wrong (rookie error - 0 points for this answer)? I think my function is still 0(nlogn), but as the collection gets larger can you really say they're "comparable"

I didn't get into profiling this line by line because it's meant to be more fundamental

like image 216
user7247147 Avatar asked Dec 01 '25 08:12

user7247147


1 Answers

As pointed out in the comments by @CH, the pop() from the front came out to be the culprit.

By making a small change to reverse the lists before popping and then reversing back after, I managed to make my implementation actually faster than the other one:

def merge(left, right):
    merged_lst = []
    if left: left.reverse()
    if right: right.reverse()
    while left and right:
        merged_lst.append((left.pop() if left[-1] <= right[-1] else right.pop()))
    if left: left.reverse()
    if right: right.reverse()
    merged_lst.extend(left if left else right)
    return merged_lst

Medium List: merge_sort: 0.5s mergeSort: 0.7s

Big List: merge_sort: 2.6s mergeSort: 4.0s

like image 126
user7247147 Avatar answered Dec 05 '25 03:12

user7247147



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!