In Java 8, what's the difference between Stream.map()
and Stream.flatMap()
methods?
The map() transformation takes in a function and applies it to each element in the RDD and the result of the function is a new value of each element in the resulting RDD. The flatMap() is used to produce multiple output elements for each input element.
Furthermore, map() and flatMap() can be distinguished in a way that map() generates a single value against an input while flatMap() generates zero or any number values against an input. In other words, map() is used to transform the data while the flatMap() is used to transform and flatten the stream.
You should use a map() if you just want to transform one Stream into another where each element gets converted to one single value. Use flatMap() if the function used by map operation returns multiple values and you want just one list containing all values.
The map() method wraps the underlying sequence in a Stream instance, whereas the flatMap() method allows avoiding nested Stream<Stream<R>> structure. Here, map() produces a Stream consisting of the results of applying the toUpperCase() method to the elements of the input Stream: List<String> myList = Stream.
Both map
and flatMap
can be applied to a Stream<T>
and they both return a Stream<R>
. The difference is that the map
operation produces one output value for each input value, whereas the flatMap
operation produces an arbitrary number (zero or more) values for each input value.
This is reflected in the arguments to each operation.
The map
operation takes a Function
, which is called for each value in the input stream and produces one result value, which is sent to the output stream.
The flatMap
operation takes a function that conceptually wants to consume one value and produce an arbitrary number of values. However, in Java, it's cumbersome for a method to return an arbitrary number of values, since methods can return only zero or one value. One could imagine an API where the mapper function for flatMap
takes a value and returns an array or a List
of values, which are then sent to the output. Given that this is the streams library, a particularly apt way to represent an arbitrary number of return values is for the mapper function itself to return a stream! The values from the stream returned by the mapper are drained from the stream and are passed to the output stream. The "clumps" of values returned by each call to the mapper function are not distinguished at all in the output stream, thus the output is said to have been "flattened."
Typical use is for the mapper function of flatMap
to return Stream.empty()
if it wants to send zero values, or something like Stream.of(a, b, c)
if it wants to return several values. But of course any stream can be returned.
Stream.flatMap
, as it can be guessed by its name, is the combination of a map
and a flat
operation. That means that you first apply a function to your elements, and then flatten it. Stream.map
only applies a function to the stream without flattening the stream.
To understand what flattening a stream consists in, consider a structure like [ [1,2,3],[4,5,6],[7,8,9] ]
which has "two levels". Flattening this means transforming it in a "one level" structure : [ 1,2,3,4,5,6,7,8,9 ]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With