Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unable to select top 10 records per group in sparksql

Hi I am new to spark sql. I have a data frame like this.

  ---+----------+----+----+----+------------------------+
 |tag id|timestamp|listner| orgid |org2id|RSSI
 +---+----------+----+----+----+------------------------+
 |  4|1496745912| 362|   4|   3|                    0.60|
 |  4|1496745924|1901|   4|   3|                    0.60|
 |  4|1496746030|1901|   4|   3|                    0.60|
 |  4|1496746110| 718|   4|   3|                    0.30|
 |  2|1496746128| 718|   4|   3|                    0.60|
 |  2|1496746188|1901|   4|   3|                    0.10|

I want to select for each listner top 10 timestamp values in spark sql.

I tried the following query.It throws errors.

  val avg = sqlContext.sql("select top 10 * from avg_table") // throws error.

  val avg = sqlContext.sql("select rssi,timestamp,tagid from avg_table order by desc limit 10")  // it prints only 10 records.

I want to select for each listner I need to take top 10 timestamp values. Any help will be appreciated.

like image 653
Teju Priya Avatar asked Jul 05 '17 10:07

Teju Priya


1 Answers

Here we can used dense_rank also

select *
from (select *,
             dense_rank() over (partition by listner order by timestamp) as rank
      from avg_table
     ) 
where rank <= 10;

Difference Between dense_rank() and row_number() is dense_rank() provide the same rank/number to matching column[on partitioned is done] values in multiple row where as row_number() provide the unique row number/rank to matching column values in multiple row

Thanks

like image 103
bajran Avatar answered Oct 21 '22 11:10

bajran