spark finding max value and the associated key

Question

My question is based upon this question. I have a spark pair RDD (key, count): [(a,1), (b,2), (c,1), (d,3)].

How can I find the both the key with highest count and the actual count?

Quentin Pradet · Accepted Answer

(sc
    .parallelize([("a",1), ("b",5), ("c",1), ("d",3)])
    .max(key=lambda x:x[1]))

does return ('b', 5), not only 5. The first parameter of max is the key used for comparison (explicited here), but max still returns the whole value, here the complete tuple.

user1501308 · Answer

val myRDD = sc.parallelize(Array(
     |      | ("a",1),
     |      | ("b",5),
     |      | ("c",1),
     |      | ("d",3))).sortBy(_._2,false).take(1)

Sorting on the values in descending order and taking topmost element.

spark finding max value and the associated key

Tags:

python

max

tuples

apache-spark

pyspark

user2543622

2 Answers

Quentin Pradet

user1501308

Recent Activity

Donate For Us

spark finding max value and the associated key

Tags:

python

max

tuples

apache-spark

pyspark

user2543622

2 Answers

Quentin Pradet

user1501308

Related questions

Recent Activity

Donate For Us