I have a SQL query written by someone else and I'm trying to figure out what it does. Can someone please explain what the <code>Partition By</code> and <code>Row_Number</code> keywords does here and give a simple example of it in action, as well as why one would want to use it? An example of partition by: <pre class="prettyprint"><code>(SELECT cdt.*, ROW_NUMBER () OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency ORDER BY cdt.country_code, cdt.account, cdt.currency) seq_no FROM CUSTOMER_DETAILS cdt); </code></pre> I've seen some examples online, they are in bit too depth. Thanks in advance!

<code>PARTITION BY</code> segregate sets, this enables you to be able to work(ROW_NUMBER(),COUNT(),SUM(),etc) on related set independently. In your query, the related set comprised of rows with similar cdt.country_code, cdt.account, cdt.currency. When you partition on those columns and you apply ROW_NUMBER on them. Those other columns on those combination/set will receive sequential number from ROW_NUMBER But that query is funny, if your partition by some unique data and you put a row_number on it, it will just produce same number. It's like you do an ORDER BY on a partition that is guaranteed to be unique. Example, think of GUID as unique combination of <code>cdt.country_code, cdt.account, cdt.currency</code> <code>newid()</code> produces GUID, so what shall you expect by this expression? <pre class="prettyprint"><code>select hi,ho, row_number() over(partition by newid() order by hi,ho) from tbl; </code></pre> ...Right, all the partitioned(none was partitioned, every row is partitioned in their own row) rows' row_numbers are all set to 1 Basically, you should partition on non-unique columns. ORDER BY on OVER needed the PARTITION BY to have a non-unique combination, otherwise all row_numbers will become 1 An example, this is your data: <pre class="prettyprint"><code>create table tbl(hi varchar, ho varchar); insert into tbl values ('A','X'), ('A','Y'), ('A','Z'), ('B','W'), ('B','W'), ('C','L'), ('C','L'); </code></pre> Then this is analogous to your query: <pre class="prettyprint"><code>select hi,ho, row_number() over(partition by hi,ho order by hi,ho) from tbl; </code></pre> What will be the output of that? <pre class="prettyprint"><code>HI HO COLUMN_2 A X 1 A Y 1 A Z 1 B W 1 B W 2 C L 1 C L 2 </code></pre> You see thee combination of HI HO? The first three rows has unique combination, hence they are set to 1, the B rows has same W, hence different ROW_NUMBERS, likewise with HI C rows. Now, why is the <code>ORDER BY</code> needed there? If the previous developer merely want to put a row_number on similar data (e.g. HI B, all data are B-W, B-W), he can just do this: <pre class="prettyprint"><code>select hi,ho, row_number() over(partition by hi,ho) from tbl; </code></pre> But alas, Oracle(and Sql Server too) doesn't allow partition with no <code>ORDER BY</code>; whereas in Postgresql, <code>ORDER BY</code> on PARTITION is optional: http://www.sqlfiddle.com/#!1/27821/1 <pre class="prettyprint"><code>select hi,ho, row_number() over(partition by hi,ho) from tbl; </code></pre> Your <code>ORDER BY</code> on your partition look a bit redundant, not because of the previous developer's fault, some database just don't allow <code>PARTITION</code> with no <code>ORDER BY</code>, he might not able find a good candidate column to sort on. If both PARTITION BY columns and ORDER BY columns are the same just remove the ORDER BY, but since some database don't allow it, you can just do this: <pre class="prettyprint"><code>SELECT cdt.*, ROW_NUMBER () OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency ORDER BY newid()) seq_no FROM CUSTOMER_DETAILS cdt </code></pre> You cannot find a good column to use for sorting similar data? You might as well sort on random, the partitioned data have the same values anyway. You can use GUID for example(you use <code>newid()</code> for SQL Server). So that has the same output made by previous developer, it's unfortunate that some database doesn't allow <code>PARTITION</code> with no <code>ORDER BY</code> Though really, it eludes me and I cannot find a good reason to put a number on the same combinations (B-W, B-W in example above). It's giving the impression of database having redundant data. Somehow reminded me of this: How to get one unique record from the same list of records from table? No Unique constraint in the table It really looks arcane seeing a PARTITION BY with same combination of columns with ORDER BY, can not easily infer the code's intent. Live test: http://www.sqlfiddle.com/#!3/27821/6 <hr> But as dbaseman have noticed also, it's useless to partition and order on same columns. You have a set of data like this: <pre class="prettyprint"><code>create table tbl(hi varchar, ho varchar); insert into tbl values ('A','X'), ('A','X'), ('A','X'), ('B','Y'), ('B','Y'), ('C','Z'), ('C','Z'); </code></pre> Then you PARTITION BY hi,ho; and then you ORDER BY hi,ho. There's no sense numbering similar data :-) http://www.sqlfiddle.com/#!3/29ab8/3 <pre class="prettyprint"><code>select hi,ho, row_number() over(partition by hi,ho order by hi,ho) as nr from tbl; </code></pre> Output: <pre class="prettyprint"><code>HI HO ROW_QUERY_A A X 1 A X 2 A X 3 B Y 1 B Y 2 C Z 1 C Z 2 </code></pre> See? Why need to put row numbers on same combination? What you will analyze on triple A,X, on double B,Y, on double C,Z? :-) <hr> You just need to use PARTITION on non-unique column, then you sort on non-unique column(s)'s unique-ing column. Example will make it more clear: <pre class="prettyprint"><code>create table tbl(hi varchar, ho varchar); insert into tbl values ('A','D'), ('A','E'), ('A','F'), ('B','F'), ('B','E'), ('C','E'), ('C','D'); select hi,ho, row_number() over(partition by hi order by ho) as nr from tbl; </code></pre> <code>PARTITION BY hi</code> operates on non unique column, then on each partitioned column, you order on its unique column(ho), <code>ORDER BY ho</code> Output: <pre class="prettyprint"><code>HI HO NR A D 1 A E 2 A F 3 B E 1 B F 2 C D 1 C E 2 </code></pre> That data set makes more sense Live test: http://www.sqlfiddle.com/#!3/d0b44/1 And this is similar to your query with same columns on both PARTITION BY and ORDER BY: <pre class="prettyprint"><code>select hi,ho, row_number() over(partition by hi,ho order by hi,ho) as nr from tbl; </code></pre> And this is the ouput: <pre class="prettyprint"><code>HI HO NR A D 1 A E 1 A F 1 B E 1 B F 1 C D 1 C E 1 </code></pre> See? no sense? Live test: http://www.sqlfiddle.com/#!3/d0b44/3 <hr> Finally this might be the right query: <pre class="prettyprint"><code>SELECT cdt.*, ROW_NUMBER () OVER (PARTITION BY cdt.country_code, cdt.account -- removed: cdt.currency ORDER BY -- removed: cdt.country_code, cdt.account, cdt.currency) -- keep seq_no FROM CUSTOMER_DETAILS cdt </code></pre>

Oracle 'Partition By' and 'Row_Number' keyword

Q: What does Rownum do in Oracle?

You can use ROWNUM to limit the number of rows returned by a query, as in this example: SELECT * FROM employees WHERE ROWNUM < 10; If an ORDER BY clause follows ROWNUM in the same query, then the rows will be reordered by the ORDER BY clause. The results can vary depending on the way the rows are accessed.

Q: What is difference between ROW_NUMBER and Rowid?

Rowid is automatically assigned with every inserted into a table. Rownum is a dynamic value automatically retrieved along with select statement output.

Tags:

sql

oracle

row-number

analytic-functions

partition

I have a SQL query written by someone else and I'm trying to figure out what it does. Can someone please explain what the Partition By and Row_Number keywords does here and give a simple example of it in action, as well as why one would want to use it?

An example of partition by:

(SELECT cdt.*,         ROW_NUMBER ()         OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency               ORDER BY cdt.country_code, cdt.account, cdt.currency)            seq_no    FROM CUSTOMER_DETAILS cdt);

I've seen some examples online, they are in bit too depth.

Thanks in advance!

812

asked May 07 '12 05:05

HashimR

1 Answers

PARTITION BY segregate sets, this enables you to be able to work(ROW_NUMBER(),COUNT(),SUM(),etc) on related set independently.

In your query, the related set comprised of rows with similar cdt.country_code, cdt.account, cdt.currency. When you partition on those columns and you apply ROW_NUMBER on them. Those other columns on those combination/set will receive sequential number from ROW_NUMBER

But that query is funny, if your partition by some unique data and you put a row_number on it, it will just produce same number. It's like you do an ORDER BY on a partition that is guaranteed to be unique. Example, think of GUID as unique combination of cdt.country_code, cdt.account, cdt.currency

newid() produces GUID, so what shall you expect by this expression?

select    hi,ho,    row_number() over(partition by newid() order by hi,ho) from tbl;

...Right, all the partitioned(none was partitioned, every row is partitioned in their own row) rows' row_numbers are all set to 1

Basically, you should partition on non-unique columns. ORDER BY on OVER needed the PARTITION BY to have a non-unique combination, otherwise all row_numbers will become 1

An example, this is your data:

create table tbl(hi varchar, ho varchar);  insert into tbl values ('A','X'), ('A','Y'), ('A','Z'), ('B','W'), ('B','W'), ('C','L'), ('C','L');

Then this is analogous to your query:

select    hi,ho,    row_number() over(partition by hi,ho order by hi,ho) from tbl;

What will be the output of that?

HI  HO  COLUMN_2 A   X   1 A   Y   1 A   Z   1 B   W   1 B   W   2 C   L   1 C   L   2

You see thee combination of HI HO? The first three rows has unique combination, hence they are set to 1, the B rows has same W, hence different ROW_NUMBERS, likewise with HI C rows.

Now, why is the ORDER BY needed there? If the previous developer merely want to put a row_number on similar data (e.g. HI B, all data are B-W, B-W), he can just do this:

select    hi,ho,    row_number() over(partition by hi,ho) from tbl;

But alas, Oracle(and Sql Server too) doesn't allow partition with no ORDER BY; whereas in Postgresql, ORDER BY on PARTITION is optional: http://www.sqlfiddle.com/#!1/27821/1

select    hi,ho,    row_number() over(partition by hi,ho) from tbl;

Your ORDER BY on your partition look a bit redundant, not because of the previous developer's fault, some database just don't allow PARTITION with no ORDER BY, he might not able find a good candidate column to sort on. If both PARTITION BY columns and ORDER BY columns are the same just remove the ORDER BY, but since some database don't allow it, you can just do this:

SELECT cdt.*,         ROW_NUMBER ()         OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency               ORDER BY newid())            seq_no    FROM CUSTOMER_DETAILS cdt

You cannot find a good column to use for sorting similar data? You might as well sort on random, the partitioned data have the same values anyway. You can use GUID for example(you use newid() for SQL Server). So that has the same output made by previous developer, it's unfortunate that some database doesn't allow PARTITION with no ORDER BY

Though really, it eludes me and I cannot find a good reason to put a number on the same combinations (B-W, B-W in example above). It's giving the impression of database having redundant data. Somehow reminded me of this: How to get one unique record from the same list of records from table? No Unique constraint in the table

It really looks arcane seeing a PARTITION BY with same combination of columns with ORDER BY, can not easily infer the code's intent.

Live test: http://www.sqlfiddle.com/#!3/27821/6

But as dbaseman have noticed also, it's useless to partition and order on same columns.

You have a set of data like this:

create table tbl(hi varchar, ho varchar);  insert into tbl values ('A','X'), ('A','X'), ('A','X'), ('B','Y'), ('B','Y'), ('C','Z'), ('C','Z');

Then you PARTITION BY hi,ho; and then you ORDER BY hi,ho. There's no sense numbering similar data :-) http://www.sqlfiddle.com/#!3/29ab8/3

select    hi,ho,    row_number() over(partition by hi,ho order by hi,ho) as nr from tbl;

Output:

HI  HO  ROW_QUERY_A A   X   1 A   X   2 A   X   3 B   Y   1 B   Y   2 C   Z   1 C   Z   2

See? Why need to put row numbers on same combination? What you will analyze on triple A,X, on double B,Y, on double C,Z? :-)

You just need to use PARTITION on non-unique column, then you sort on non-unique column(s)'s unique-ing column. Example will make it more clear:

create table tbl(hi varchar, ho varchar);  insert into tbl values ('A','D'), ('A','E'), ('A','F'), ('B','F'), ('B','E'), ('C','E'), ('C','D');  select    hi,ho,    row_number() over(partition by hi order by ho) as nr from tbl;

PARTITION BY hi operates on non unique column, then on each partitioned column, you order on its unique column(ho), ORDER BY ho

Output:

HI  HO  NR A   D   1 A   E   2 A   F   3 B   E   1 B   F   2 C   D   1 C   E   2

That data set makes more sense

Live test: http://www.sqlfiddle.com/#!3/d0b44/1

And this is similar to your query with same columns on both PARTITION BY and ORDER BY:

select    hi,ho,    row_number() over(partition by hi,ho order by hi,ho) as nr from tbl;

And this is the ouput:

HI  HO  NR A   D   1 A   E   1 A   F   1 B   E   1 B   F   1 C   D   1 C   E   1

See? no sense?

Live test: http://www.sqlfiddle.com/#!3/d0b44/3

Finally this might be the right query:

SELECT cdt.*,      ROW_NUMBER ()      OVER (PARTITION BY cdt.country_code, cdt.account -- removed: cdt.currency            ORDER BY                 -- removed: cdt.country_code, cdt.account,                 cdt.currency) -- keep         seq_no FROM CUSTOMER_DETAILS cdt

112

answered Sep 20 '22 03:09

Michael Buen

Related questions
                            
                                How to select data from 30 days?
                            
                                Export from SQL Server 2012 to .CSV through Management Studio
                            
                                How do I delete multiple rows with different IDs?
                            
                                Java - Storing SQL statements in an external file [closed]
                            
                                UPDATE if exists else INSERT in SQL Server 2008 [duplicate]
                            
                                Split string and take last element
                            
                                Group mysql query by 15 min intervals
                            
                                store arabic in SQL database
                            
                                SQL WHERE column = everything
                            
                                SQL get the last date time record [duplicate]
                            
                                Notepad++/Eclipse sql code auto-indent option?
                            
                                MySQL - How do I enter NULL?
                            
                                Why is a SQL float different from a C# float
                            
                                How can I do a BEFORE UPDATED trigger with sql server?
                            
                                Filtering a Pyspark DataFrame with SQL-like IN clause
                            
                                Why do I get "A cursor with the name already exists"?
                            
                                How do I add multiple "NOT LIKE '%?%' in the WHERE clause of sqlite3?
                            
                                Incorrect syntax near ''
                            
                                SQL - Call Stored Procedure for each record
                            
                                SQL update statement in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With