Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to understand over() and partition by

Tags:

I am trying to get the over and partition by functionality wrapped around my head. Here is an example that I just do not understand.

Here is the data I have:

SALESORDERID       ORDERDATE  43894              08/01/2001  43664              07/01/2001  43911              08/01/2001  43867              08/01/2001  43877              08/01/2001  44285              10/01/2001  44501              11/01/2001  43866              08/01/2001  43895              08/01/2001  43860              08/01/2001 

When I run this query:

select Row_Number() over(partition by orderdate order by orderdate asc)      as Rownumber, salesorderid, orderdate from test2 order by rownumber 

Here are the results I get:

ROWNUMBER     SALESORDERID       ORDERDATE  1             43664              07/01/2001  1             43911              08/01/2001  1             44109              09/01/2001  1             44483              11/01/2001  1             44285              10/01/2001  2             43867              08/01/2001  2             44501              11/01/2001  3             43895              08/01/2001  4             43894              08/01/2001  5             43877              08/01/2001  

Can someone explain this query to me. I am not new to SQL but windowing I have been struggling with and can't get my head wrapped around this.

like image 609
Luke101 Avatar asked Feb 16 '12 16:02

Luke101


People also ask

What is over and partition by in SQL?

The SQL PARTITION BY expression is a subclause of the OVER clause, which is used in almost all invocations of window functions like AVG() , MAX() , and RANK() . As many readers probably know, window functions operate on window frames which are sets of rows that can be different for each record in the query result.

What does over partition by mean?

The Window function uses the OVER() clause, and it can include the following functions: Partition By: This divides the rows or query result set into small partitions. Order By: This arranges the rows in ascending or descending order for the partition window. The default order is ascending.

What does Row_number () over partition by do?

PARTITION BY The ROW_NUMBER() method is then applied to each partition, which assigns a separate rank number to each partition. If the partition by clause is not specified, the ROW NUMBER function will treat the entire result as a single partition and rank it from top to bottom.

What is over () in SQL?

That is, the OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window.


2 Answers

Try ordering by order date, you'll see the results more easily

select Row_Number() over(partition by orderdate order by orderdate asc)      as Rownumber, salesorderid, orderdate from test2 order by orderdate; 

should give (i've added blank lines for clarity)

ROWNUMBER     SALESORDERID       ORDERDATE 1             43664              07/01/2001  1             43911              08/01/2001 2             43867              08/01/2001 3             43895              08/01/2001 4             43894              08/01/2001 5             43877              08/01/2001  1             44109              09/01/2001  1             44285              10/01/2001  1             44483              11/01/2001 2             44501              11/01/2001 

You'll notice that the result is divided into 'partitions', each partition being the set of rows with identical orderdates. That is what 'partition by orderdate' means.

Within a partition, the rows are ordered by orderdate, as per the second clause of '(partition by orderdate order by orderdate asc)'. That isn't very useful, as all rows within a partition are going to have the same orderdate. Because of that, the ordering of the rows within a partition is random. Try ordering by salesorderid within the partition by clause to have a more reproducable result.

row_number() just returns the row's ordering within each partition

like image 100
Chi Avatar answered Sep 30 '22 13:09

Chi


The partition by orderdate means that you're only comparing records to other records with the same orderdate. For example, of the five records with orderdate = '08/01/2001', one will have row_number() = 1, one will have row_number() = 2, and so on.

The order by orderdate asc means that, within a partition, row-numbers are to be assigned in order of orderdate. In your example that has no effect, because you're already partitioning by orderdate, so all records within a partition will have the same orderdate. (It would be like writing SELECT ... FROM t WHERE c = 6 ORDER BY c: all selected records have the same value of c, so the ORDER BY c does nothing.) So, within a partition, the assignment of row_number() is arbitrary: each row will have a different number, but there are no guarantees about which row will have which number.

like image 34
ruakh Avatar answered Sep 30 '22 12:09

ruakh