I want to combine multiple IO
values that should run independently in parallel.
val io1: IO[Int] = ???
val io2: IO[Int] = ???
As I see it, I have to options:
val parallelSum1: IO[Int] = for {
fiber1 <- io1.start
fiber2 <- io2.start
i1 <- fiber1.join
i2 <- fiber2.join
} yield i1 + i2
Parallel
instance for IO
with parMapN
(or one of its siblings like parTraverse
, parSequence
, parTupled
etc)
val parallelSum2: IO[Int] = (io1, io2).parMapN(_ + _)
Not sure about the pros and cons of each approach, and when should I choose one over the other. This becomes even more tricky when abstracting over the effect type IO
(tagless-final style):
def io1[F[_]]: F[Int] = ???
def io2[F[_]]: F[Int] = ???
def parallelSum1[F[_]: Concurrent]: F[Int] = for {
fiber1 <- io1[F].start
fiber2 <- io2[F].start
i1 <- fiber1.join
i2 <- fiber2.join
} yield i1 + i2
def parallelSum2[F[_], G[_]](implicit parallel: Parallel[F, G]): F[Int] =
(io1[F], io2[F]).parMapN(_ + _)
The Parallel
typeclass requires 2 type constructors, making it somewhat more cumbersome to use, without context bounds and with an additional vague type parameter G[_]
Your guidance is appreciated :)
Amitay
Cats Effect is a high-performance, asynchronous, composable framework for building real-world applications in a purely functional style within the Typelevel ecosystem.
Cats is a library which provides abstractions for functional programming in the Scala programming language. Scala supports both object-oriented and functional programming, and this is reflected in the hybrid approach of the standard library.
You can think of fibers as being lightweight threads, a fiber being a concurrency primitive for doing cooperative multi-tasking. trait Fiber[F[_], A] { def cancel: F[Unit] def join: F[A] } For example a Fiber value is the result of evaluating IO.start : import cats.effect.{Fiber, IO} import scala.concurrent.
I want to combine multiple IO values that should run independently in parallel.
The way I view it, in order to figure out "when do I use which?", we need to return the the old parallel vs concurrent discussion, which basically boils down to (quoting the accepted answer):
Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant. For example, multitasking on a single-core machine.
Parallelism is when tasks literally run at the same time, e.g., on a multicore processor.
We often like to provide an example of concurrency when we we do IO like operations, such as creating an over the wire call, or talking to disk.
Question is, which one do you want when you say you want to execute "in parallel", is it the former or the latter?
If we're referring to the former, then using Concurrent[F]
both conveys the intention by the signature and provides the proper execution semantics. If it's the latter, and we, for example, want to process a collection of elements in parallel, then going with Parallel[F, G]
would be the better solution.
It is often quite confusing when we think about the semantics of this regarding IO
, because it has both instances for Parallel
and Concurrent
and we mostly use it to opaquely define side effecting operations.
As a side note, the reason behind Parallel
taking two unary type constructors is because of the fact that M
(in Parallel[M[_], F[_]]
) in always a Monad
instance, and we need a way to prove the Monad has an Applicative[F]
instance as well for parallel executions, because when we think of a Monad we always talk about sequential execution semantics.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With