Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which general purpose sorting algorithm does Swift use? It does not perform well on sorted data

I have been picking and probing at Swift standard libraries sort() function for its Array type. To my surprise I have noticed it performs poorly on already-sorted data.

Sorting an array of Int which is shuffled seems to be 5x faster than sorting that very same array when it is already sorted. Sorting an array of shuffled objects is about 4x faster than sorting the very same one already in sorted order (sorting object array vs Int array use different algorithms I am sure so I sorted both to eliminate bias).

These are the results:

Shuffled Int array sort time: 1.3961209654808
Shuffled ColorObject array sort time: 3.14633798599243
NOnshuffled Int array sort time: 7.34714204072952
NOnshuffled ColorObject array sort time: 10.9310839772224

For reference below is my code:

class ElapsedTimer {

    let startTime: CFAbsoluteTime
    var endTime: CFAbsoluteTime?

    init() {
        startTime = CFAbsoluteTimeGetCurrent()
    }

    func stop() -> CFAbsoluteTime {
        endTime = CFAbsoluteTimeGetCurrent()
        return duration!
    }

    var duration: CFAbsoluteTime? {
        if let endTime = endTime {
            return endTime - startTime
        } else {
            return nil
        }
    }
}

public class CountedColor {
    public private(set) var count: Int
    public private(set) var color: UIColor

    public init(color: UIColor, colorCount: Int) {
        self.count = colorCount
        self.color = color
    }
}

var distributedIntArray = [Int]()

for value in 1..<1000000 {
    distributedIntArray.append(value)
}

var distributedCountedColorArray = distributedIntArray.map{ CountedColor(color: UIColor.white, colorCount: $0) }

distributedCountedColorArray.shuffle()
distributedIntArray.shuffle()

var timer = ElapsedTimer()
distributedIntArray.sort()
print("Shuffled Int array sort time: \(timer.stop())")
timer = ElapsedTimer()

distributedCountedColorArray.sort{ return $0.count < $1.count }
print("Shuffled Color array sort time: \(timer.stop())")
timer = ElapsedTimer()

distributedIntArray.sort()
print("NOnshuffled Int array sort time: \(timer.stop())")
timer = ElapsedTimer()

distributedCountedColorArray.sort{ return $0.count < $1.count }
print("Non shuffled Color array sort time: \(timer.stop())")

My array shuffle() method was pulled from this post. My ElapsedTimer simply wraps and uses CACurrentMediaTime() functions.

My question is why am I seeing this behavior? Especially when I am sorting the object array which should surely be using a general purpose sort. What kind of general purpose sorting algorithm is swift using? It surely can’t be one where the worst case and average case are the same like mergeSort.

like image 707
AyBayBay Avatar asked Dec 08 '16 03:12

AyBayBay


People also ask

Which sorting algorithm does swift use?

The Swift standard library sort algorithm uses a hybrid of sorting approaches with insertion sort being used for small (<20 element) unsorted partitions.

Which sort algorithm works best on mostly sorted data?

Insertion sort is the clear winner on this initial condition. Bubble sort is fast, but insertion sort has lower overhead. Shell sort is fast because it is based on insertion sort. Merge sort, heap sort, and quick sort do not adapt to nearly sorted data.

Which sorting algorithm does sorted () use?

Algorithm used by sorted() The Python sorted() uses the Timsort algorithm which is a hybrid sorting algorithm, derived from merge sort and insertion sort.

Which sorting algorithm is best if it is already sorted & Why?

Insertion Sort If the data is nearly sorted or when the list is small as it has a complexity of O(N2) and if the list is sorted a minimum number of elements will slide over to insert the element at its correct location. This algorithm is stable and it has fast running case when the list is nearly sorted.


2 Answers

Swift uses Introsort. Looking at the source code we see that the chosen pivot is the first element. The wikipedia page on Introsort says:

(...), one of the critical operations is choosing the pivot: the element around which the list is partitioned. The simplest pivot selection algorithm is to take the first or the last element of the list as the pivot, causing poor behavior for the case of sorted or nearly sorted input.

Thus it is entirely predictable, given the implementation choice, that Swift's sorting performance is worst for sorted inputs.

I have built a complete benchmark for people who want to easily reproduce the OP's claims : https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/extra/swift/sort

For reference, the GNU ISO C++ standard library uses a median-of-3 pivot (as per the stl_algo.h header).

like image 167
Daniel Lemire Avatar answered Sep 22 '22 14:09

Daniel Lemire


In the Swift 5 evolution, the IntroSort algorithm was replaced with a modified version TimSort (implemented first in 2002 by Tim Peters for Python) in the 'sort()' method: https://github.com/apple/swift/blob/master/stdlib/public/core/Sort.swift

like image 24
Soheil Novinfard Avatar answered Sep 19 '22 14:09

Soheil Novinfard