Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Scala use all my cores here?

Tags:

scala

object PrefixScan {
  sealed abstract class Tree[A]
  case class Leaf[A](a: A) extends Tree[A]
  case class Node[A](l: Tree[A], r: Tree[A]) extends Tree[A]

  sealed abstract class TreeRes[A] { val res : A }
  case class LeafRes[A](override val res: A) extends TreeRes[A]
  case class NodeRes[A](l : TreeRes[A], override val res: A, r: TreeRes[A]) extends TreeRes[A]

  def reduceRes[A](t: Tree[A], f:(A,A)=>A): TreeRes[A] = t match {
    case Leaf(v) => LeafRes(v)
    case Node(l, r) => {
      val (tL, tR) = (reduceRes(l, f), reduceRes(r, f))
      NodeRes(tL, f(tL.res, tR.res), tR)
    }
  }
}

I'm concerned about the reduceRes function.

It works ... the result of the computation is great!

However I went and implemented another version, reduceResPar, that uses fork-join at the first few branches, to parallelize the computation. But it gave no speed up.

Then I went back and realized .. the above version, reduceRes, is already using all 12 cores on my machine!! How can it do that? I thought it would just be 1 core!

This code is from the Parallel Programming course on Coursera In the last lecture of week 2, we are learning about parallel prefix scan operations.

like image 611
katie '-' Avatar asked Aug 10 '17 10:08

katie '-'


1 Answers

How can it do that? I thought it would just be 1 core!

The fact that you see all your cores being used doesn't mean your code execution is parallel. We can see from the implementation it's sequential, but we don't know which CPU our single thread will get scheduled on by the OS on each cycle.

When you execute a method inside a thread, the OS decides how many CPU time slices it will get and when, according to a priority queue it manages.

To see that your algorithm may run on different cores we can ask the OS on which logical core it's currently executing our thread. I've prepared a small implementation for Windows, which has a native WinAPI method called GetCurrentProcessorNumber() which returns the processor number we're executing on. We'll use JNA for the example:

build.sbt:

"net.java.dev.jna" % "jna" % "4.4.0"

Java implementation:

import com.sun.jna.Library;
import com.sun.jna.Native;

public class ProcessorNumberNative {

    public interface CLibrary extends Library {
        CLibrary INSTANCE = (CLibrary)
                Native.loadLibrary("Kernel32.dll",
                        CLibrary.class);

        Integer GetCurrentProcessorNumber();
    }
}

Now let's add a println on each of the steps in your recursion:

def reduceRes[A](t: Tree[A], f: (A, A) => A): TreeRes[A] = t match {
  case Leaf(v) =>
    println(s"Logical Processor Number: ${ProcessorNumberNative.CLibrary.INSTANCE.GetCurrentProcessorNumber()}")
    LeafRes(v)

  case Node(l, r) => 
    println(s"Logical Processor Number: ${ProcessorNumberNative.CLibrary.INSTANCE.GetCurrentProcessorNumber()}")
    val (tL, tR) = (reduceRes(l, f), reduceRes(r, f))
    NodeRes(tL, f(tL.res, tR.res), tR)
}

Now let's create a tree and execute:

def main(args: Array[String]): Unit = {

  val tree = Node(Leaf(1),
                Node(Leaf(2),
                     Node(Node(Leaf(24), Leaf(30)),
                          Node(Leaf(3), Node(Leaf(10), Leaf(52))))))

  reduceRes(tree, (a: Int, b: Int) => a + b)
}

And give this two different runs (I'm running a computer with 4 logical cores):

First:

Logical Processor Number: 1
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 3
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 0
Logical Processor Number: 0

Second:

Logical Processor Number: 1
Logical Processor Number: 3
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 1
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 3
Logical Processor Number: 3

During each execution, you see that the executing thread got a slice of execution on 3 different cores, 0, 1 and 3, while we're still running in a single threaded environment. This goes to show that although the computation of your algorithm is definitely sequential, that doesn't mean you won't be seeing all your cores in play.

like image 176
Yuval Itzchakov Avatar answered Oct 24 '22 22:10

Yuval Itzchakov