Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL window function: row_number() over (partition col order by col2)

Following result set is derived from a sql query with a few joins and a union. The sql query already groups rows on Date and game. I need a column to describe the number of attempts at a game partitioned by date column.

Username   Game     ID   Date

johndoe1   Game_1   100  7/22/14 1:52 AM
johndoe1   Game_1   100  7/22/14 1:52 AM
johndoe1   Game_1   100  7/22/14 1:52 AM
johndoe1   Game_1   100  7/22/14 1:52 AM
johndoe1   Game_1   121  7/22/14 1:56 AM
johndoe1   Game_1   121  7/22/14 1:56 AM
johndoe1   Game_1   121  7/22/14 1:56 AM
johndoe1   Game_1   121  7/22/14 1:56 AM
johndoe1   Game_1   121  7/22/14 1:56 AM
johndoe1   Game_1   130  7/22/14 1:59 AM
johndoe1   Game_1   130  7/22/14 1:59 AM
johndoe1   Game_1   130  7/22/14 1:59 AM
johndoe1   Game_1   130  7/22/14 1:59 AM
johndoe1   Game_1   130  7/22/14 1:59 AM
johndoe1   Game_1   200  7/22/14 2:54 AM
johndoe1   Game_1   200  7/22/14 2:54 AM
johndoe1   Game_1   200  7/22/14 2:54 AM
johndoe1   Game_1   200  7/22/14 2:54 AM
johndoe1   Game_1   210  7/22/14 3:54 AM
johndoe1   Game_1   210  7/22/14 3:54 AM
johndoe1   Game_1   210  7/22/14 3:54 AM
johndoe1   Game_1   210  7/22/14 3:54 AM

I've the following sql query that enumerates the rows within the partition but not entirely correct since I want the count of the instances of that game based on the date and game. In this case johndoe1 has attempted at Game_1 five times partitioned by the time stamps.

This query returns result set below

select *
, row_number() over (partition by ct."date" order by ct."date") as "Attempts"
from csv_temp as ct

Username   Game     ID   Date             Attempts  (Desired Attempts col.)

johndoe1   Game_1   100  7/22/14 1:52 AM  1          1
johndoe1   Game_1   100  7/22/14 1:52 AM  2          1
johndoe1   Game_1   100  7/22/14 1:52 AM  3          1
johndoe1   Game_1   100  7/22/14 1:52 AM  4          1
johndoe1   Game_1   121  7/22/14 1:56 AM  1          2
johndoe1   Game_1   121  7/22/14 1:56 AM  2          2
johndoe1   Game_1   121  7/22/14 1:56 AM  3          2
johndoe1   Game_1   121  7/22/14 1:56 AM  4          2
johndoe1   Game_1   121  7/22/14 1:56 AM  5          2
johndoe1   Game_1   130  7/22/14 1:59 AM  1          3   
johndoe1   Game_1   130  7/22/14 1:59 AM  2          3
johndoe1   Game_1   130  7/22/14 1:59 AM  3          3
johndoe1   Game_1   130  7/22/14 1:59 AM  4          3
johndoe1   Game_1   130  7/22/14 1:59 AM  5          3
johndoe1   Game_1   200  7/22/14 2:54 AM  1          4
johndoe1   Game_1   200  7/22/14 2:54 AM  2          4
johndoe1   Game_1   200  7/22/14 2:54 AM  3          4
johndoe1   Game_1   200  7/22/14 2:54 AM  4          4
johndoe1   Game_1   210  7/22/14 3:54 AM  1          5
johndoe1   Game_1   210  7/22/14 3:54 AM  2          5
johndoe1   Game_1   210  7/22/14 3:54 AM  3          5
johndoe1   Game_1   210  7/22/14 3:54 AM  4          5

Any pointers would be of great help.

like image 917
user1951677 Avatar asked Aug 29 '14 06:08

user1951677


People also ask

Does ROW_NUMBER require ORDER BY?

The ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified partition. It is required.

What is ROW_NUMBER () over ORDER BY column?

The Row_Number function is used to provide consecutive numbering of the rows in the result by the order selected in the OVER clause for each partition specified in the OVER clause. It will assign the value 1 for the first row and increase the number of the subsequent rows.

What is ROW_NUMBER () in PostgreSQL?

In PostgreSQL, the ROW_NUMBER() function is used to assign a unique integer to every row that is returned by a query. Syntax: ROW_NUMBER() OVER( [PARTITION BY column_1, column_2, …] [ORDER BY column_3, column_4, …] )

What is ROW_NUMBER () over partition by in hive?

ROW_NUMBER() function in Hive. Row_number is one of the analytics function in Hive. It will assign the unique number(1,2,3…) for each row based on the column value that used in the OVER clause. In addition, A partitioned By clause is used to split the rows into groups based on column value.


1 Answers

Consider partition by to be similar to the fields that you would group by, then, when the partition values change, the windowing function restarts at 1

EDIT as indicated by a_horse_with_no_name, for this need we need dense_rank() unlike row_number() rank() or dense_rank() repeat the numbers it assigns. row_number() must be a different value for each row in a partition. The difference between rank() and dense_rank() is the latter does not "skip" numbers.

For your query try:

dense_rank() over (partition by Username, Game order by ct."date") as "Attempts"

You don't partition by, and order by, the same field by the way; just order by would be sufficient if that was the need. It isn't here.

like image 99
Paul Maxwell Avatar answered Oct 23 '22 10:10

Paul Maxwell