How to Sort a Dataframe in Pyspark [duplicate]

I have a dataframe:

# +---+--------+---------+
# | id|  rank  |  value  |
# +---+--------+---------+
# |  1|    A   |    10   |
# |  2|    B   |    46   |
# |  3|    D   |     8   |
# |  4|    C   |     8   |
# +---+--------+---------+

I want to sort it by value, then rank. This seems like it should be simple, but I'm not seeing how it's done in the documentation or SO for Pyspark, only for R and Scala.

This is how it should look after sorting, .show() should print:

# +---+--------+---------+
# | id|  rank  |  value  |
# +---+--------+---------+
# |  4|    C   |     8   |
# |  3|    D   |     8   |
# |  1|    A   |    10   |
# |  2|    B   |    46   |
# +---+--------+---------+
Tibberzz


2 Answers

df.orderBy(["value", "rank"], ascending=[1, 1])

Reference: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy

gasparms


say your dataframe is stored in a variable called df you'd do df.orderBy('value').show() to get it sorted

Arnon Rotem-Gal-Oz

Arnon Rotem-Gal-Oz