Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: Sort an RDD by multiple values in a tuple / columns

So I have an RDD as follows

RDD[(String, Int, String)]

And as an example

    ('b', 1, 'a')
    ('a', 1, 'b')
    ('a', 0, 'b')
    ('a', 0, 'a')

The final result should look something like

('a', 0, 'a')
('a', 0, 'b')
('a', 1, 'b')
('b', 1, 'a')

How would I do something like this?

like image 338
adrian Avatar asked Mar 13 '23 17:03

adrian


1 Answers

Try this:

rdd.sortBy(r => r)

If you wanted to switch the sort order around, you could do this:

rdd.sortBy(r => (r._3, r._1, r._2))

For reverse order:

rdd.sortBy(r => r, false)
like image 161
David Griffin Avatar answered Mar 15 '23 05:03

David Griffin