My question is based upon this question. I have a spark pair RDD (key, count): [(a,1), (b,2), (c,1), (d,3)].
How can I find the both the key with highest count and the actual count?
(sc
.parallelize([("a",1), ("b",5), ("c",1), ("d",3)])
.max(key=lambda x:x[1]))
does return ('b', 5), not only 5. The first parameter of max is the key used for comparison (explicited here), but max still returns the whole value, here the complete tuple.
val myRDD = sc.parallelize(Array(
| | ("a",1),
| | ("b",5),
| | ("c",1),
| | ("d",3))).sortBy(_._2,false).take(1)
Sorting on the values in descending order and taking topmost element.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With