Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow Swift Arrays and Strings performance

Here is two pretty similar Levenshtein Distance algorithms.

Swift implementation: https://gist.github.com/bgreenlee/52d93a1d8fa1b8c1f38b

And Objective-C implementation: https://gist.github.com/boratlibre/1593632

The swift one is dramatically slower then ObjC implementation I've send couple of hours to make it faster but... It seems like Swift arrays and Strings manipulation are not as fast as objC.

On 2000 random Strings calculations Swift implementation is about 100(!!!) times slower then ObjC.

Honestly speaking, I've got no idea what could be wrong, coz even this part of swift

func levenshtein(aStr: String, bStr: String) -> Int {
// create character arrays
let a = Array(aStr)
let b = Array(bStr)
...

is few times slower then whole algorithm in Objective C

Is anyone knows how to speedup swift calculations?

Thank you in advance!

Append

After all suggested improvements swift code looks like this. And it is 4 times slower then ObjC in release configuration.

import Foundation
class Array2D {
    var cols:Int, rows:Int
    var matrix:UnsafeMutablePointer<Int>


    init(cols:Int, rows:Int) {
        self.cols = cols
        self.rows = rows
        matrix = UnsafeMutablePointer<Int>(malloc(UInt(cols * rows) * UInt(sizeof(Int))))
        for i in 0...cols*rows {
            matrix[i] = 0
        }

    }

    subscript(col:Int, row:Int) -> Int {
        get {
            return matrix[cols * row + col] as Int
        }
        set {
            matrix[cols*row+col] = newValue
        }
    }

    func colCount() -> Int {
        return self.cols
    }

    func rowCount() -> Int {
        return self.rows
    }
}

extension String {
    func levenshteinDistanceFromStringSwift(comparingString: NSString) -> Int {
        let aStr = self
        let bStr = comparingString

//        let a = Array(aStr.unicodeScalars)
//        let b = Array(bStr.unicodeScalars)

        let a:NSString = aStr
        let b:NSString = bStr

        var dist = Array2D(cols: a.length + 1, rows: b.length + 1)



        for i in 1...a.length {
            dist[i, 0] = i
        }

        for j in 1...b.length {
            dist[0, j] = j
        }

        for i in 1...a.length {
            for j in 1...b.length {
                if a.characterAtIndex(i-1) == b.characterAtIndex(j-1) {
                    dist[i, j] = dist[i-1, j-1]  // noop
                } else {
                    dist[i, j] = min(
                        dist[i-1, j] + 1,  // deletion
                        dist[i, j-1] + 1,  // insertion
                        dist[i-1, j-1] + 1  // substitution
                    )
                }
            }
        }

        return dist[a.length, b.length]

    }
    func levenshteinDistanceFromStringObjC(comparingString: String) -> Int {
        let aStr = self
        let bStr = comparingString
        //It is really strange, but I should link Objective-C coz dramatic slow swift performance
        return aStr.compareWithWord(bStr, matchGain: 0, missingCost: 1)

    }

}

malloc?? NSString?? and at the end 4 times speed decrease? Is anybody needs swift anymore?

like image 269
Alexey Avatar asked Nov 18 '14 09:11

Alexey


1 Answers

There are multiple reasons why the Swift code is slower than the Objective-C code. I made a very simple test case by comparing two fixed strings 100 times.

  • Objective-C code: 0.026 seconds
  • Swift code: 3.14 seconds

The first reason is that a Swift Character represents an "extended grapheme cluster", which can contain several Unicode code points (e.g. "flags"). This makes the decomposition of a string into characters slow. On the other hand, Objective-C NSString stores the strings as a sequence of UTF-16 code points.

If you replace

let a = Array(aStr)
let b = Array(bStr)

by

let a = Array(aStr.utf16)
let b = Array(bStr.utf16)

so that the Swift code works on UTF-16 sequences as well then the time goes down to 1.88 seconds.

The allocation of the 2-dimensional array is also slow. It is faster to allocate a single one-dimensional array. I found a simple Array2D class here: http://blog.trolieb.com/trouble-multidimensional-arrays-swift/

class Array2D {
    var cols:Int, rows:Int
    var matrix: [Int]


    init(cols:Int, rows:Int) {
        self.cols = cols
        self.rows = rows
        matrix = Array(count:cols*rows, repeatedValue:0)
    }

    subscript(col:Int, row:Int) -> Int {
        get {
            return matrix[cols * row + col]
        }
        set {
            matrix[cols*row+col] = newValue
        }
    }

    func colCount() -> Int {
        return self.cols
    }

    func rowCount() -> Int {
        return self.rows
    }
}

Using that class in your code

func levenshtein(aStr: String, bStr: String) -> Int {
    let a = Array(aStr.utf16)
    let b = Array(bStr.utf16)

    var dist = Array2D(cols: a.count + 1, rows: b.count + 1)

    for i in 1...a.count {
        dist[i, 0] = i
    }

    for j in 1...b.count {
        dist[0, j] = j
    }

    for i in 1...a.count {
        for j in 1...b.count {
            if a[i-1] == b[j-1] {
                dist[i, j] = dist[i-1, j-1]  // noop
            } else {
                dist[i, j] = min(
                    dist[i-1, j] + 1,  // deletion
                    dist[i, j-1] + 1,  // insertion
                    dist[i-1, j-1] + 1  // substitution
                )
            }
        }
    }

    return dist[a.count, b.count]
}

the time in the test case goes down to 0.84 seconds.

The last bottleneck that I found in the Swift code is the min() function. The Swift library has a built-in min() function which is faster. So just removing the custom function from the Swift code reduces the time for the test case to 0.04 seconds, which is almost as good as the Objective-C version.

Addendum: Using Unicode scalars seems to be even slightly faster:

let a = Array(aStr.unicodeScalars)
let b = Array(bStr.unicodeScalars)

and has the advantage that it works correctly with surrogate pairs such as Emojis.

like image 141
Martin R Avatar answered Sep 22 '22 03:09

Martin R