Is there any built in transformation to have sum on Ints of following rdd
org.apache.spark.rdd.RDD[(String, (Int, Int))]
string is the key and Int array is Value, what i need is getting the sum of all Ints as RDD[(String, Int)] . I tried groupByKey with no success...
Also- The result set must be again a rdd.
Thanks in advance
If the objective is to sum elements of value (Int, Int), then a map transformation can achieve it:
val arr = Array(("A", (1, 1)), ("B", (2, 2)), ("C", (3, 3))
val rdd = sc.parallelize(arr)
val result = rdd.map{ case (a, (b, c)) => (a, b + c) }
// result.collect = Array((A,2), (B,4), (C,6))
Instead if the value type is an Array, Array.sum can be used.
val rdd = sc.parallelize(Array(("A", Array(1, 1)), 
                               ("B", Array(2, 2)), 
                               ("C", Array(3, 3)))
rdd.map { case (a, b) => (a, b.sum) }
Edit:
map transformation does not keep the original partitioner, as @Justin  suggested mapValues may be more appropriate here:
rdd.mapValues{ case (x, y) => x + y }
rdd.mapValues(_.sum) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With