Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easy parallel evaluation of tuples in scala?

If two expressions e1 and e2 only deal with immutable data structures, then evaluating the tuple (e1, e2) in parallel should be dead simple, just evaluate the two expressions on different processors and don't worry about any interactions, because there shouldn't be any.

Scala has lots of immutable data structures, so I would expect there to be a super simple (to write) way of evaluating that tuple in parallel. Something like

par_tuple : ( Unit -> T1) -> (Unit -> T2) -> (T1, t2)

which evaluates the two functions in parallel and returns when both have finished.

I haven't seen it yet, though. Does it exist? If not how would you write it?

like image 555
John Salvatier Avatar asked Oct 16 '12 21:10

John Salvatier


People also ask

What is tuples in Scala?

Tuples are heterogeneous data structures, i.e., is they can store elements of different data types. A tuple is immutable, unlike an array in scala which is mutable. An example of a tuple storing an integer, a string, and boolean value. Type of tuple is defined by, the number of the element it contains and datatype of those elements.

What is the maximum number of elements a Scala tuple can store?

In scala, tuples can store a fixed number of elements into it. The maximum limit for storing elements is 22. If we try to store elements greater its size then it will generate one error.

What is a tuple in Python?

An example of a tuple storing an integer, a string, and boolean value. Type of tuple is defined by, the number of the element it contains and datatype of those elements.

What is the maximum number of elements in a tuple?

The Scala tuples are immutable, i.e the objects of different type can be stored in the tuple but the value of these objects cannot be changed. The maximum number of miscellaneous elements that a tuple can have is twenty-two.


1 Answers

It depends how costly is the expression being evaluated is. On current architectures, two expressions involving few to dozens, even hundreds of instructions cannot be evaluated in parallel efficiently. So you should always make sure that the amount of work you're executing isn't shadowed by the cost of parallelization itself.

With this disclaimer in mind, in Scala 2.10 you can use Futures to accomplish this:

val f = future { e1 }
val g = future { e2 }
(Await.result(f), Await.result(g))

Note that this style of computations is discouraged (and the above is deliberately overly verbose!), because it involves blocking, and blocking on platform such as the JVM, where there is no concept of efficient continuations, is often costly (though the situations where it is applicable is beyond the scope of this answer, and probably of this answerer). In most cases you should install a callback on the future which is called once its value becomes available. You can do this instead:

val h = for {
  x <- f
  y <- g
} yield (x, y)

where h above is a new future which will contain a tuple of values once both become available.

You could rewrite your function par_tuple to either:

def par_tuple[E1, E2](e1: =>E1, e2: =>E2): Future[(E1, E2)] = {
  val f = future { e1 }
  val g = future { e2 }
  val h: Future[(E1, E2)] = for {
    x <- f
    y <- g
  } yield (x, y)
  h
}

This method returns a Future of the tuple you want - an object which will eventually hold the tuple with your expressions. You can compose this future further with other computations, or if you're sure you want to block, you can have another variant:

def par_tuple_blocking[E1, E2](e1: =>E1, e2: =>E2): (E1, E2) = Await.result(par_tuple(e1, e2))

which blocks until the tuple becomes available in the future.

See more about futures, callbacks and blocking here.

like image 130
axel22 Avatar answered Sep 30 '22 09:09

axel22