I have a dataframe:
# +---+--------+---------+
# | id| rank | value |
# +---+--------+---------+
# | 1| A | 10 |
# | 2| B | 46 |
# | 3| D | 8 |
# | 4| C | 8 |
# +---+--------+---------+
I want to sort it by value, then rank. This seems like it should be simple, but I'm not seeing how it's done in the documentation or SO for Pyspark, only for R and Scala.
This is how it should look after sorting, .show() should print:
# +---+--------+---------+
# | id| rank | value |
# +---+--------+---------+
# | 4| C | 8 |
# | 3| D | 8 |
# | 1| A | 10 |
# | 2| B | 46 |
# +---+--------+---------+
df.orderBy(["value", "rank"], ascending=[1, 1])
Reference: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
say your dataframe is stored in a variable called df
you'd do df.orderBy('value').show()
to get it sorted
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With