I want to apply a function via flatMap
to each group produced by DataSet.groupBy
. Trying to call flatMap
I get the compiler error:
error: value flatMap is not a member of org.apache.flink.api.scala.GroupedDataSet
My code:
var mapped = env.fromCollection(Array[(Int, Int)]())
var groups = mapped.groupBy("myGroupField")
groups.flatMap( myFunction: (Int, Array[Int]) => Array[(Int, Array[(Int, Int)])] ) // error: GroupedDataSet has no member flatMap
Indeed, in the documentation of flink-scala 0.9-SNAPSHOT no map
or similar is listed. Is there a similar method to work with? How to achieve the desired distributed mapping over each group individually on a node?
You can use reduceGroup(GroupReduceFunction f)
to process all elements a group. A GroupReduceFunction
gives you an Iterable
over all elements of a group and an Collector
to emit an arbitrary number of elements.
Flink's groupBy()
function does not group multiple elements into a single element, i.e., it does not convert a group of (Int, Int)
elements (that all share the same _1
tuple field) into one (Int, Array[Int])
. Instead, a DataSet[(Int, Int)]
is logically grouped such that all elements that have the same key can be processed together. When you apply a GroupReduceFunction
on a GroupedDataSet
, the function will be called once for each group. In each call all elements of a group are handed together to the function. The function can then process all elements of the group and also convert a group of (Int, Int)
elements into a single (Int, Array[Int])
element.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With