I have one DataFrame which contains these values :
Dept_id | name | salary
1 A 10
2 B 100
1 D 100
2 C 105
1 N 103
2 F 102
1 K 90
2 E 110
I want the result in this form :
Dept_id | name | salary
1 N 103
1 D 100
1 K 90
2 E 110
2 C 105
2 F 102
Thanks In Advance :).
the solution is similar to Retrieve top n in each group of a DataFrame in pyspark which is in pyspark
If you do the same in scala, then it should be as below
df.withColumn("rank", rank().over(Window.partitionBy("Dept_id").orderBy($"salary".desc)))
.filter($"rank" <= 3)
.drop("rank")
I hope the answer is helpful
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With