I want to combine multiple <code>IO</code> values that should run independently in parallel. <pre class="prettyprint lang-scala prettyprint-override"><code>val io1: IO[Int] = ??? val io2: IO[Int] = ??? </code></pre> As I see it, I have to options: <ol> <li>Use cats-effect's fibers with a fork-join pattern <pre class="prettyprint lang-scala prettyprint-override"><code>val parallelSum1: IO[Int] = for { fiber1 <- io1.start fiber2 <- io2.start i1 <- fiber1.join i2 <- fiber2.join } yield i1 + i2 </code></pre> </li> <li>Use the <code>Parallel</code> instance for <code>IO</code> with <code>parMapN</code> (or one of its siblings like <code>parTraverse</code>, <code>parSequence</code>, <code>parTupled</code> etc) <pre class="prettyprint lang-scala prettyprint-override"><code>val parallelSum2: IO[Int] = (io1, io2).parMapN(_ + _) </code></pre> </li> </ol> Not sure about the pros and cons of each approach, and when should I choose one over the other. This becomes even more tricky when abstracting over the effect type <code>IO</code> (tagless-final style): <pre class="prettyprint lang-scala prettyprint-override"><code>def io1[F[_]]: F[Int] = ??? def io2[F[_]]: F[Int] = ??? def parallelSum1[F[_]: Concurrent]: F[Int] = for { fiber1 <- io1[F].start fiber2 <- io2[F].start i1 <- fiber1.join i2 <- fiber2.join } yield i1 + i2 def parallelSum2[F[_], G[_]](implicit parallel: Parallel[F, G]): F[Int] = (io1[F], io2[F]).parMapN(_ + _) </code></pre> The <code>Parallel</code> typeclass requires 2 type constructors, making it somewhat more cumbersome to use, without context bounds and with an additional vague type parameter <code>G[_]</code> Your guidance is appreciated :) Amitay

<blockquote> I want to combine multiple IO values that should run independently in parallel. </blockquote> The way I view it, in order to figure out "when do I use which?", we need to return the the old parallel vs concurrent discussion, which basically boils down to (quoting the accepted answer): <blockquote> Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant. For example, multitasking on a single-core machine. Parallelism is when tasks literally run at the same time, e.g., on a multicore processor. </blockquote> We often like to provide an example of concurrency when we we do IO like operations, such as creating an over the wire call, or talking to disk. Question is, which one do you want when you say you want to execute "in parallel", is it the former or the latter? If we're referring to the former, then using <code>Concurrent[F]</code> both conveys the intention by the signature and provides the proper execution semantics. If it's the latter, and we, for example, want to process a collection of elements in parallel, then going with <code>Parallel[F, G]</code> would be the better solution. It is often quite confusing when we think about the semantics of this regarding <code>IO</code>, because it has both instances for <code>Parallel</code> and <code>Concurrent</code> and we mostly use it to opaquely define side effecting operations. As a side note, the reason behind <code>Parallel</code> taking two unary type constructors is because of the fact that <code>M</code> (in <code>Parallel[M[_], F[_]]</code>) in always a <code>Monad</code> instance, and we need a way to prove the Monad has an <code>Applicative[F]</code> instance as well for parallel executions, because when we think of a Monad we always talk about sequential execution semantics.

Cats effect - parallel composition of independent effects

Tags:

functional-programming

scala

scala-cats

cats-effect

I want to combine multiple IO values that should run independently in parallel.

val io1: IO[Int] = ???
val io2: IO[Int] = ???

As I see it, I have to options:

Use cats-effect's fibers with a fork-join pattern

val parallelSum1: IO[Int] = for {
  fiber1 <- io1.start
  fiber2 <- io2.start
  i1 <- fiber1.join
  i2 <- fiber2.join
} yield i1 + i2

Use the Parallel instance for IO with parMapN (or one of its siblings like parTraverse, parSequence, parTupled etc)
```
val parallelSum2: IO[Int] = (io1, io2).parMapN(_ + _)
```

Not sure about the pros and cons of each approach, and when should I choose one over the other. This becomes even more tricky when abstracting over the effect type IO (tagless-final style):

def io1[F[_]]: F[Int] = ???
def io2[F[_]]: F[Int] = ???

def parallelSum1[F[_]: Concurrent]: F[Int] = for {
  fiber1 <- io1[F].start
  fiber2 <- io2[F].start
  i1 <- fiber1.join
  i2 <- fiber2.join
} yield i1 + i2

def parallelSum2[F[_], G[_]](implicit parallel: Parallel[F, G]): F[Int] =
  (io1[F], io2[F]).parMapN(_ + _)

The Parallel typeclass requires 2 type constructors, making it somewhat more cumbersome to use, without context bounds and with an additional vague type parameter G[_]

Your guidance is appreciated :)

Amitay

640

asked Jan 13 '19 13:01

amitayh

1 Answers

I want to combine multiple IO values that should run independently in parallel.

The way I view it, in order to figure out "when do I use which?", we need to return the the old parallel vs concurrent discussion, which basically boils down to (quoting the accepted answer):

Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant. For example, multitasking on a single-core machine.

Parallelism is when tasks literally run at the same time, e.g., on a multicore processor.

We often like to provide an example of concurrency when we we do IO like operations, such as creating an over the wire call, or talking to disk.

Question is, which one do you want when you say you want to execute "in parallel", is it the former or the latter?

If we're referring to the former, then using Concurrent[F] both conveys the intention by the signature and provides the proper execution semantics. If it's the latter, and we, for example, want to process a collection of elements in parallel, then going with Parallel[F, G] would be the better solution.

It is often quite confusing when we think about the semantics of this regarding IO, because it has both instances for Parallel and Concurrent and we mostly use it to opaquely define side effecting operations.

As a side note, the reason behind Parallel taking two unary type constructors is because of the fact that M (in Parallel[M[_], F[_]]) in always a Monad instance, and we need a way to prove the Monad has an Applicative[F] instance as well for parallel executions, because when we think of a Monad we always talk about sequential execution semantics.

178

answered Oct 27 '22 06:10

Yuval Itzchakov

Related questions
                            
                                Unit testing Scala.js: Read test data from file residing in `test/resources`
                            
                                How to know which stage of a job is currently running in Apache Spark?
                            
                                Using Spark Structured Streaming with Trigger.Once
                            
                                Running queries in parallel in Doobie
                            
                                Dollar sign in function call in Java using Spark SQL
                            
                                Reflection API for Scala
                            
                                How do I get color coded console output from SBT on Windows?
                            
                                Improving Scala script startup time -- client mode?
                            
                                How to extend existing enumerations objects in Scala?
                            
                                How to systematically avoid unsafe pattern matching in Scala?
                            
                                Can I avoid compiling sources twice when running play2 and eclipse?
                            
                                How can I get Scala ToolBox to see REPL definitions?
                            
                                Return type of Scala for/yield
                            
                                Dependent types not working for constructors?
                            
                                Clean solution for dropping into REPL console in the middle of program execution
                            
                                Play2-Auth vs SecureSocial vs Deadbolt2
                            
                                Is it correct behaviour that `lazy val` acts like `def` in case of exception?
                            
                                Why does auxiliary constructor not see import done in the class?
                            
                                Spark Dataframe to Dataset of Java class
                            
                                "Immortal" Spark Streaming Job?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With