Instead of xs map f map g
it's more efficient to write xs map { x => g(f(x)) }
, and similarly for multiple filter
operations.
If I have two or more flatMap
s in a row, is there a way to combine them into one, that's maybe more efficient? e.g.
def f(x: String) = Set(x, x.reverse)
def g(x: String) = Set(x, x.toUpperCase)
Set("hi", "bye") flatMap f flatMap g
// Set(bye, eyb, IH, BYE, EYB, ih, hi, HI)
In Scala flatmap method is used on the collection and data structures of scale, as the name suggests it is the combination of two things methods i.e. map and Flatten method. If we use a flat map on any collection then it will apply both this method map and flatten method on the given collection.
map and flatMap are similar, in the sense they take a line from the input RDD and apply a function on it. The way they differ is that the function in map returns only one element, while function in flatMap can return a list of elements (0 or more) as an iterator. Also, the output of the flatMap is flattened.
In Scala, the function flatMap is a high order function on Option[A] that applies a function f , which returns an optional value itself – in other words, f has type A => Option[B] : If the optional value is present, it applies the function f to it.
In Scala, map is a higher order function that takes a function f . If an optional value is present, it applies the function f and returns a value wrapped with Some. else (i.e. if an optional value is absent), it returns None // definition of map in Option class.
In order to do such a transformation, we need to use some identity the operations have. For example, as you wrote, map
has the identity
map f ∘ map g ≡ map (f ∘ g)
(where ∘
stands for function composition - see Function1.compose(...); and ≡
stands for an equivalence of expressions). This is because classes with map
can be viewed as functors, so any reasonable implementation of map
must preserve this property.
On the other hand, classes that have flatMap
and have a way how to create some kind of one-element instance (like creating a singleton Set
) usually form a monad. So we may try to deduce some transformations from the monad rules. But the only identity we can deduce for repeated applications of flatMap
is
(set flatMap f) flatMap g ≡ x flatMap { y => f(y) flatMap g }
which is a kind of associativity relationship for flatMap
composition. This doesn't help much for optimizing the computation (actually it can make it worse). So, the conclusion is, there is no similar general "optimizing" identity for flatMap
.
The bottom line is: Each function given to Set.flatMap
creates a new Set
for each element it's applied to. We cannot avoid creating such intermediate sets unless we completely forget using composing flatMap
and solve the problem in some different way. Usually this is not worth, since composing flatMap
s (or using for(...) yield ..
) is much cleaner and more readable, and the little speed trade-off isn't usually a big issue.
In scalaz there is a way to compose functions like a -> m b
and b -> m c
into a -> m c
(like the function here, from String
to Set[String]
). They are called Kleisli functions, by the way. In haskell this is done simply with >=>
on those functions. In scala you'll have to be a bit more verbose (by the way, I've changed the example a bit: I couldn't make it work with Set
, so I've used List
):
scala> import scalaz._, std.list._
import scalaz._
import std.list._
scala> def f(x: String) = List(x, x.reverse)
f: (x: String)List[String]
scala> def g(x: String) = List(x, x.toUpperCase)
g: (x: String)List[java.lang.String]
scala> val composition = Kleisli(f) >=> Kleisli(g)
composition: scalaz.Kleisli[List,String,java.lang.String] = scalaz.KleisliFunctions$$anon$18@37911406
scala> List("hi", "bye") flatMap composition
res17: List[java.lang.String] = List(hi, HI, ih, IH, bye, BYE, eyb, EYB)
The approach you describe for filters basically skips the creation of intermediate collections.
With flatMap
at least the inner collections get created inside the functions, so I can't imagine any way to skip that creation without changing the function.
What you could try is using a view, although I'm not sure if this does anything usefull with flatMap.
Alternatively you could build a multiFlatMap
which builds the final collection directly from the function results, Without stuffing the intermediate collections returned from the functions in a new collection.
No idea if this is workable. I at least see some serious type challenges comming, since you would need to pass in a Seq of functions where each function returns a Collection of A where A is the input type for the next function. At least in the general case of arbitrary types and arbitrary number of functions this sound somewhat challenging.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With