Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a fast way to convert a string of two characters to an array of booleans?

Tags:

ios

swift

I have a long string (sometimes over 1000 characters) that I want to convert to an array of boolean values. And it needs to do this many times, very quickly.

let input: String = "001"
let output: [Bool] = [false, false, true]

My naive attempt was this:

input.characters.map { $0 == "1" }

But this is a lot slower than I'd like. My profiling has shown me that the map is where the slowdown is, but I'm not sure how much simpler I can make that.

I feel like this would be wicked fast without Swift's/ObjC's overhead. In C, I think this is a simple for loop where a byte of memory is compared to a constant, but I'm not sure what the functions or syntax is that I should be looking at.

Is there a way to do this much faster?

UPDATE:

I also tried a

output = []
for char in input.characters {
    output.append(char == "1")
}

And it's about 15% faster. I'm hoping for a lot more than that.

like image 647
Alex Wayne Avatar asked Mar 18 '16 00:03

Alex Wayne


Video Answer


4 Answers

This is faster:

// Algorithm 'A'
let input = "0101010110010101010"
var output = Array<Bool>(count: input.characters.count, repeatedValue: false)
for (index, char) in input.characters.enumerate() where char == "1" {
    output[index] = true
}

Update: under input = "010101011010101001000100000011010101010101010101"

0.0741 / 0.0087, where this approach is faster that author's in 8.46 times. With bigger data correlation more positive.

Also, with using nulTerminatedUTF8 speed a little increased, but not always speed higher than algorithm A:

// Algorithm 'B'
let input = "10101010101011111110101000010100101001010101"
var output = Array<Bool>(count: input.nulTerminatedUTF8.count, repeatedValue: false)
for (index, code) in input.nulTerminatedUTF8.enumerate() where code == 49 {
    output[index] = true
}

In result graph appears, with input length 2196, where first and last 0..1, A – second, B – third point. A: 0.311sec, B: 0.304sec

Algorithm comparison graph

like image 102
dimpiax Avatar answered Oct 22 '22 11:10

dimpiax


import Foundation

let input:String = "010101011001010101001010101100101010100101010110010101010101011001010101001010101100101010100101010101011001010101001010101100101010100101010"
var start  = clock()
var output = Array<Bool>(count: input.nulTerminatedUTF8.count, repeatedValue: false)
var index = 0
for val in input.nulTerminatedUTF8 {
    if val != 49 {
        output[index] = true
    }
    index+=1
}
var diff = clock() - start;
var msec = diff * 1000 / UInt(CLOCKS_PER_SEC);
print("Time taken \(Double(msec)/1000.0) seconds \(msec%1000) milliseconds");

This should be really fast. Try it out. For 010101011010101001000100000011010101010101010101 it takes 0.039 secs.

like image 5
Pradeep K Avatar answered Oct 22 '22 09:10

Pradeep K


I would guess that this is as fast as possible:

let targ = Character("1")
let input: String = "001" // your real string goes here
let inputchars = Array(input.characters)
var output:[Bool] = Array.init(count: inputchars.count, repeatedValue: false)
inputchars.withUnsafeBufferPointer {
    inputbuf in
    output.withUnsafeMutableBufferPointer {
        outputbuf in
        var ptr1 = inputbuf.baseAddress
        var ptr2 = outputbuf.baseAddress
        for _ in 0..<inputbuf.count {
            ptr2.memory = ptr1.memory == targ
            ptr1 = ptr1.successor()
            ptr2 = ptr2.successor()
        }
    }
}
// output now contains the result

The reason is that, thanks to the use of buffer pointers, we are simply cycling through contiguous memory, just like the way you cycle through a C array by incrementing its pointer. Thus, once we get past the initial setup, this should be as fast as it would be in C.

EDIT In an actual test, the time difference between the OP's original method and this one is the difference between

13.3660290241241

and

0.219357967376709

which is a pretty dramatic speed-up. I hasten to add, however, that I have excluded the initial set-up from the timing test. This line:

let inputchars = Array(input.characters)

...is particularly expensive.

like image 1
matt Avatar answered Oct 22 '22 10:10

matt


This should be a little faster than the enumerate() where char == "1" version (0.557s for 500_000 alternating ones and zeros vs. 1.159s algorithm 'A' from diampiax)

let input = inputStr.utf8
let n = input.count
var output = [Bool](count: n, repeatedValue: false)
let one = UInt8(49) // 1
for (idx, char) in input.enumerate() {
    if char == one { output[idx] = true }
}

but it's also a lot less readable ;-p

edit: both versions are slower than the map variant, maybe you forgot to compile with optimizations?

like image 1
nemesit Avatar answered Oct 22 '22 09:10

nemesit