Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Grand Central Dispatch in Swift to parallelize and speed up “for" loops?

I am trying to wrap my head around how to use GCD to parallelize and speed up Monte Carlo simulations. Most/all simple examples are presented for Objective C and I really need a simple example for Swift since Swift is my first “real” programming language.

The minimal working version of a monte carlo simulation in Swift would be something like this:

import Foundation

import Cocoa
var winner = 0
var j = 0
var i = 0
var chance = 0
var points = 0
for j=1;j<1000001;++j{
    var ability = 500

    var player1points = 0

    for i=1;i<1000;++i{
        chance = Int(arc4random_uniform(1001))
        if chance<(ability-points) {++points}
        else{points = points - 1}
    }
    if points > 0{++winner}
}
    println(winner)

The code works directly pasted into a command line program project in xcode 6.1

The innermost loop cannot be parallelized because the new value of variable “points” is used in the next loop. But the outermost just run the innermost simulation 1000000 times and tally up the results and should be an ideal candidate for parallelization.

So my question is how to use GCD to parallelize the outermost for loop?

like image 925
user2523167 Avatar asked Nov 24 '14 13:11

user2523167


2 Answers

A "multi-threaded iteration" can be done with dispatch_apply():

let outerCount = 100    // # of concurrent block iterations
let innerCount = 10000  // # of iterations within each block

let the_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply(UInt(outerCount), the_queue) { outerIdx -> Void in
    for innerIdx in 1 ... innerCount {
       // ...
    }
}

(You have to figure out the best relation between outer and inner counts.)

There are two things to notice:

  • arc4random() uses an internal mutex, which makes it extremely slow when called from several threads in parallel, see Performance of concurrent code using dispatch_group_async is MUCH slower than single-threaded version. From the answers given there, rand_r() (with separate seeds for each thread) seems to be faster alternative.

  • The result variable winner must not be modified from multiple threads simultaneously. You can use an array instead where each thread updates its own element, and the results are added afterwards. A thread-safe method has been described in https://stackoverflow.com/a/26790019/1187415.

Then it would roughly look like this:

let outerCount = 100     // # of concurrent block iterations
let innerCount = 10000   // # of iterations within each block

let the_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

var winners = [Int](count: outerCount, repeatedValue: 0)
winners.withUnsafeMutableBufferPointer { winnersPtr -> Void in

    dispatch_apply(UInt(outerCount), the_queue) { outerIdx -> Void in
        var seed = arc4random() // seed for rand_r() in this "thread"

        for innerIdx in 1 ... innerCount {
            var points = 0
            var ability = 500

            for i in 1 ... 1000 {
                let chance = Int(rand_r(&seed) % 1001)
                if chance < (ability-points) { ++points }
                else {points = points - 1}
            }
            if points > 0 {
                winnersPtr[Int(outerIdx)] += 1
            }
        }
    }
}

// Add results:
let winner = reduce(winners, 0, +)
println(winner)
like image 77
Martin R Avatar answered Nov 08 '22 17:11

Martin R


Just to update this for contemporary syntax, we now use concurrentPerform (which replaces dispatch_apply).

So we can replace

for j in 0 ..< 1_000_000 {
    for i in 0 ..< 1000 {
        ...
    }
}

With

DispatchQueue.concurrentPerform(1_000_000) { j in
    for i in 0 ..< 1000 {
        ...
    }
}

Note, parallelizing introduces a little overhead, in both the basic GCD dispatch mechanism, as well as the synchronization of the results. If you had 32 iterations in your parallel loop this would be inconsequential, but you have a million iterations, and it will start to add up.

We generally solve this by “striding”: Rather than parallelizing 1 million iterations, you might only do 100 parallel iterations, doing 10,000 iterations each. E.g. something like:

let totalIterations = 1_000_000
let stride = 10_000
let (quotient, remainder) = totalIterations.quotientAndRemainder(dividingBy: stride)
let iterations = quotient + remainder == 0 ? 0 : 1

DispatchQueue.concurrentPerform(iterations: iterations) { iteration in
    for j in iteration * stride ..< min(totalIterations, (iteration + 1) * stride) {
        for i in 0 ..< 1000 {
            ...
        }
    }
}
like image 25
Rob Avatar answered Nov 08 '22 18:11

Rob