group by and picking up first value in spark sql [duplicate]

Tags:

I am doing group by action in spark sql.In that some rows contain same value with different ID.In that case I want to select first row.

This is my code.

    val highvalueresult = highvalue.select($"tagShortID", $"Timestamp", $"ListenerShortID", $"rootOrgID", $"subOrgID",  $"RSSI_Weight_avg")
                          .groupBy("tagShortID", "Timestamp").agg(max($"RSSI_Weight_avg")
                          .alias("RSSI_Weight_avg"))

        val t2 = averageDF.join(highvalueresult, Seq("tagShortID", "Timestamp", "RSSI_Weight_avg"))

And this is my result.

tag,timestamp,rssi,listner,rootorg,suborg
2,1496745906,0.7,3878,4,3
4,1496745907,0.6,362,4,3
4,1496745907,0.6,718,4,3
4,1496745907,0.6,1901,4,3

In the above result for the time stamp 1496745907 same rssi values for three listner.In this case I want to select the first row.

677

asked Jul 20 '17 06:07

Jessi joseph

1 Answers

You can use the windowing functions support that spark sql context has Assuming you dataframe is:

+---+----------+----+-------+-------+------+
|tag| timestamp|rssi|listner|rootorg|suborg|
+---+----------+----+-------+-------+------+
|  2|1496745906| 0.7|   3878|      4|     3|
|  4|1496745907| 0.6|    362|      4|     3|
|  4|1496745907| 0.6|    718|      4|     3|
|  4|1496745907| 0.6|   1901|      4|     3|
+---+----------+----+-------+-------+------+

Define a window function as(you can partition by/order by your columns):

val window = Window.partitionBy("timestamp", "rssi").orderBy("timestamp")

Apply the window function:

res1.withColumn("rank", row_number().over(window))
+---+----------+----+-------+-------+------+----+
|tag| timestamp|rssi|listner|rootorg|suborg|rank|
+---+----------+----+-------+-------+------+----+
|  4|1496745907| 0.6|    362|      4|     3|   1|
|  4|1496745907| 0.6|    718|      4|     3|   2|
|  4|1496745907| 0.6|   1901|      4|     3|   3|
|  2|1496745906| 0.7|   3878|      4|     3|   1|
+---+----------+----+-------+-------+------+----+

Select the first rows from each window

    res5.where($"rank" === 1)
+---+----------+----+-------+-------+------+----+
|tag| timestamp|rssi|listner|rootorg|suborg|rank|
+---+----------+----+-------+-------+------+----+
|  4|1496745907| 0.6|    362|      4|     3|   1|
|  2|1496745906| 0.7|   3878|      4|     3|   1|
+---+----------+----+-------+-------+------+----+

100

answered Oct 10 '22 14:10

dumitru

Related questions
                            
                                Scala Stdin.readLine() does not seem to work as expected
                            
                                Convert scala.List[scala.Long] to List<java.util.Long>
                            
                                Scala how to sum a list of futures
                            
                                Unbound Wildcard Type
                            
                                RDD to LabeledPoint conversion
                            
                                Scala types: Class A is not equal to the T where T is: type T = A
                            
                                Find size of data stored in rdd from a text file in apache spark
                            
                                Scala: Get sum of nth element from tuple array/RDD
                            
                                Smartly deal with Option[T] in Scala
                            
                                Scala require() equivalent in Kotlin
                            
                                How to use ConcurrentHashMap computeIfAbsent() in Scala
                            
                                Scala flatMap, what are ms and e?
                            
                                "No Manifest available for Type" error
                            
                                How to call superclass constructor from child class in scala and how to do constructor chaining
                            
                                How to extract values from json string?
                            
                                andThen in List scala
                            
                                How to get substring in scala?
                            
                                Map column values to a a numeric type in spark
                            
                                I can't understand 'RDD.map{ case (A, B) => A } ' in Scala Spark
                            
                                Passing two columns to a udf in scala?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

group by and picking up first value in spark sql [duplicate]

Tags:

scala

apache-spark

apache-spark-sql

Jessi joseph

People also ask

1 Answers

dumitru

Recent Activity

Donate For Us