How to Sort a Dataframe in Pyspark [duplicate]

Question

I have a dataframe:

# +---+--------+---------+
# | id|  rank  |  value  |
# +---+--------+---------+
# |  1|    A   |    10   |
# |  2|    B   |    46   |
# |  3|    D   |     8   |
# |  4|    C   |     8   |
# +---+--------+---------+

I want to sort it by value, then rank. This seems like it should be simple, but I'm not seeing how it's done in the documentation or SO for Pyspark, only for R and Scala.

This is how it should look after sorting, .show() should print:

# +---+--------+---------+
# | id|  rank  |  value  |
# +---+--------+---------+
# |  4|    C   |     8   |
# |  3|    D   |     8   |
# |  1|    A   |    10   |
# |  2|    B   |    46   |
# +---+--------+---------+

gasparms · Accepted Answer

df.orderBy(["value", "rank"], ascending=[1, 1])

Reference: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy

Arnon Rotem-Gal-Oz · Answer

say your dataframe is stored in a variable called df you'd do df.orderBy('value').show() to get it sorted

How to Sort a Dataframe in Pyspark [duplicate]

Tags:

dataframe

apache-spark

pyspark

Tibberzz

2 Answers

gasparms

Arnon Rotem-Gal-Oz

Recent Activity

Donate For Us

How to Sort a Dataframe in Pyspark [duplicate]

Tags:

dataframe

apache-spark

pyspark

Tibberzz

2 Answers

gasparms

Arnon Rotem-Gal-Oz

Related questions

Recent Activity

Donate For Us