Hi I am new to spark sql. I have a data frame like this.
---+----------+----+----+----+------------------------+
|tag id|timestamp|listner| orgid |org2id|RSSI
+---+----------+----+----+----+------------------------+
| 4|1496745912| 362| 4| 3| 0.60|
| 4|1496745924|1901| 4| 3| 0.60|
| 4|1496746030|1901| 4| 3| 0.60|
| 4|1496746110| 718| 4| 3| 0.30|
| 2|1496746128| 718| 4| 3| 0.60|
| 2|1496746188|1901| 4| 3| 0.10|
I want to select for each listner top 10 timestamp values in spark sql.
I tried the following query.It throws errors.
val avg = sqlContext.sql("select top 10 * from avg_table") // throws error.
val avg = sqlContext.sql("select rssi,timestamp,tagid from avg_table order by desc limit 10") // it prints only 10 records.
I want to select for each listner I need to take top 10 timestamp values. Any help will be appreciated.
Here we can used dense_rank also
select *
from (select *,
dense_rank() over (partition by listner order by timestamp) as rank
from avg_table
)
where rank <= 10;
Difference Between dense_rank() and row_number() is dense_rank() provide the same rank/number to matching column[on partitioned is done] values in multiple row where as row_number() provide the unique row number/rank to matching column values in multiple row
Thanks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With