Can we define a set of DSL operation in Scala that perform parallelly with each other just like using pipe-line processing in Linux

Question

Forgive me my poor English but I will try my best to express my question.

Suppose I want to process a large text whose operation is to filter content through a key word; change them to lowercase; and then print them onto the standard output. As we all know, we can do this using pipeline in Linux BASH script :

cat article.txt | grep "I" | tr "I" "i" > /dev/stdout

where cat article.txt, grep "I", tr "I" "i" > /dev/stdout are running in parallel.

In Scala, we probably do it like this:

//or read from a text file , e.g. article.txt 
val strList = List("I", "am", "a" , "student", ".", "I", "come", "from", "China", ".","I","love","peace")  
strList.filter( _ == "I").map(_.toLowerCase).foreach(println)

My question is how we can make filter, map and foreach parallel?

thx

tstenner · Accepted Answer

In 2.9, parallel collections were added. To parallelize the loop, all you have to do is to convert it by calling the par member function.

Your code would look like this:

val strList = List("I", "am", "a" , "student", ".", "I", "come", "from", "China", ".","I","love","peace")  // or read from a text file , e.g. article.txt 
strList.par.filter( _ == "I").map(_.toLowerCase).foreach(println)

Stefan Endrullis · Answer

tstenner's solution is probably the most efficiency solution in your situation, since it can achieve a high degree of parallelism (each single item could be theoretically processed in parallel). However, your bash example is just using pipeline parallelism and this kind of parallelism is unfortunately not directly supported by Scalas parallel collections.

To achieve pipeline parallelism your operators (filter, map, foreach) have to be executed by different threads, e.g., by using Actors.

In general I think it would be nice feature for Scala to have a simple API for that. But, for your example I doubt that pipeline parallelism would speedup your execution time that much. If you just use very simple filter and map operations I assume that the communication overhead (for FIFOs / Actor mailboxes) consumes the whole speedup of your parallel execution.

Can we define a set of DSL operation in Scala that perform parallelly with each other just like using pipe-line processing in Linux

Tags:

parallel-processing

scala

爱国者

2 Answers

tstenner

Stefan Endrullis

Recent Activity

Donate For Us

Can we define a set of DSL operation in Scala that perform parallelly with each other just like using pipe-line processing in Linux

Tags:

parallel-processing

scala

爱国者

2 Answers

tstenner

Stefan Endrullis

Related questions

Recent Activity

Donate For Us