I am trying to get the over and partition by functionality wrapped around my head. Here is an example that I just do not understand.
Here is the data I have:
SALESORDERID ORDERDATE 43894 08/01/2001 43664 07/01/2001 43911 08/01/2001 43867 08/01/2001 43877 08/01/2001 44285 10/01/2001 44501 11/01/2001 43866 08/01/2001 43895 08/01/2001 43860 08/01/2001
When I run this query:
select Row_Number() over(partition by orderdate order by orderdate asc) as Rownumber, salesorderid, orderdate from test2 order by rownumber
Here are the results I get:
ROWNUMBER SALESORDERID ORDERDATE 1 43664 07/01/2001 1 43911 08/01/2001 1 44109 09/01/2001 1 44483 11/01/2001 1 44285 10/01/2001 2 43867 08/01/2001 2 44501 11/01/2001 3 43895 08/01/2001 4 43894 08/01/2001 5 43877 08/01/2001
Can someone explain this query to me. I am not new to SQL but windowing I have been struggling with and can't get my head wrapped around this.
The SQL PARTITION BY expression is a subclause of the OVER clause, which is used in almost all invocations of window functions like AVG() , MAX() , and RANK() . As many readers probably know, window functions operate on window frames which are sets of rows that can be different for each record in the query result.
The Window function uses the OVER() clause, and it can include the following functions: Partition By: This divides the rows or query result set into small partitions. Order By: This arranges the rows in ascending or descending order for the partition window. The default order is ascending.
PARTITION BY The ROW_NUMBER() method is then applied to each partition, which assigns a separate rank number to each partition. If the partition by clause is not specified, the ROW NUMBER function will treat the entire result as a single partition and rank it from top to bottom.
That is, the OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window.
Try ordering by order date, you'll see the results more easily
select Row_Number() over(partition by orderdate order by orderdate asc) as Rownumber, salesorderid, orderdate from test2 order by orderdate;
should give (i've added blank lines for clarity)
ROWNUMBER SALESORDERID ORDERDATE 1 43664 07/01/2001 1 43911 08/01/2001 2 43867 08/01/2001 3 43895 08/01/2001 4 43894 08/01/2001 5 43877 08/01/2001 1 44109 09/01/2001 1 44285 10/01/2001 1 44483 11/01/2001 2 44501 11/01/2001
You'll notice that the result is divided into 'partitions', each partition being the set of rows with identical orderdates. That is what 'partition by orderdate' means.
Within a partition, the rows are ordered by orderdate, as per the second clause of '(partition by orderdate order by orderdate asc)'. That isn't very useful, as all rows within a partition are going to have the same orderdate. Because of that, the ordering of the rows within a partition is random. Try ordering by salesorderid within the partition by clause to have a more reproducable result.
row_number() just returns the row's ordering within each partition
The partition by orderdate
means that you're only comparing records to other records with the same orderdate
. For example, of the five records with orderdate = '08/01/2001'
, one will have row_number() = 1
, one will have row_number() = 2
, and so on.
The order by orderdate asc
means that, within a partition, row-numbers are to be assigned in order of orderdate
. In your example that has no effect, because you're already partitioning by orderdate
, so all records within a partition will have the same orderdate
. (It would be like writing SELECT ... FROM t WHERE c = 6 ORDER BY c
: all selected records have the same value of c
, so the ORDER BY c
does nothing.) So, within a partition, the assignment of row_number()
is arbitrary: each row will have a different number, but there are no guarantees about which row will have which number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With