Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum values of PairRDD

I have an RDD of type:

dataset :org.apache.spark.rdd.RDD[(String, Double)] = MapPartitionRDD[26]

Which is equivalent to (Pedro, 0.0833), (Hello, 0.001828) ...

I'd like to sum all the value , 0.0833+0.001828.. but I can't find a proper solution.

like image 779
bouritosse Avatar asked Jan 06 '23 11:01

bouritosse


1 Answers

Considering your input data, you can do the following :

// example
val datasets = sc.parallelize(List(("Pedro", 0.0833), ("Hello", 0.001828))) 
datasets.map(_._2).sum()
// res3: Double = 0.085128
// or
datasets.map(_._2).reduce(_ + _)
// res4: Double = 0.085128
// or even
datasets.values.sum()
// res5: Double = 0.085128
like image 56
eliasah Avatar answered Jan 10 '23 20:01

eliasah