Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RDD.sortByKey using a function in python?

Lets say my key is not a simple data type but a class, and I need to sort the keys by using a comparison function. In Scala, I can do this by using, new Ordering. How can I achieve the same functionality in Python? For example, what would be the equivalent code in Python?

implicit val someClassOrdering = new Ordering[SomeClass] {
        override def compare(a: SomeClass, b: SomeClass) = a.compare(b)
    }
like image 991
MetallicPriest Avatar asked Nov 20 '25 14:11

MetallicPriest


1 Answers

You can pass keyfunc argument:

from numpy.random import seed, randint
from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])

seed(1)
rdd = sc.parallelize(
    (Point(randint(10), randint(10)), randint(100)) for _ in range(5))

Now, lets say you want to sort Points by y coordinate:

rdd.sortByKey(keyfunc=lambda p: p.y).collect()

and result is:

[(Point(x=5, y=0), 16),
 (Point(x=9, y=2), 20),
 (Point(x=5, y=2), 84),
 (Point(x=1, y=7), 6),
 (Point(x=5, y=8), 9)]
like image 88
zero323 Avatar answered Nov 22 '25 03:11

zero323



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!