I don't understand why there are different results when using an <code>ORDER BY</code> clause in an analytic <code>COUNT</code> function. Using a simple example: <pre class="prettyprint"><code>with req as (select 1 as n, 'A' as cls from dual union select 2 as n, 'A' as cls from dual) select req.*, count(*) over(partition by cls) as cnt from req; </code></pre> gives the the following result: <pre class="prettyprint"><code>N CLS CNT 2 A 2 1 A 2 </code></pre> Whereas, when adding an <code>ORDER BY</code> in the analytic clause, the result is different! <pre class="prettyprint"><code>with req as (select 1 as n, 'A' as cls from dual union select 2 as n, 'A' as cls from dual) select req.*, count(*) over(partition by cls order by n) as cnt from req; </code></pre> CNT column changed: <pre class="prettyprint"><code>N CLS CNT 1 A 1 2 A 2 </code></pre> Can someone explain please? Thanks

First, a link to docs. It's somewhat obscure, however. Analytic clause consists of <code>query_partition_clause</code>, <code>order_by_clause</code> and <code>windowing_clause</code>. And, a really important thing about <code>windowing_clause</code> is <blockquote> You cannot specify this clause unless you have specified the <code>order_by_clause</code>. Some window boundaries defined by the <code>RANGE</code> clause let you specify only one expression in the <code>order_by_clause</code>. Refer to "Restrictions on the ORDER BY Clause". </blockquote> But not only can you not use <code>windowing_clause</code> without the <code>order_by_clause</code>, they are tied together. <blockquote> If you omit the windowing_clause entirely, then the default is <code>RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>. </blockquote> The default windowing clause produces something like running total. <code>COUNT</code> returns <code>1</code> for first row, as there is only one row between the top of the window and the current row, <code>2</code> for the second row and so on. So in your first query there is no windowing at all, but there is the default windowing in the second one. And you can simulate the behavior of the first query by specifying fully unbounded window. <pre class="prettyprint"><code>with req as (select 1 as n, 'A' as cls from dual union select 2 as n, 'A' as cls from dual) select req.*, count(*) over(partition by cls order by n RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as cnt from req; </code></pre> Yep <pre class="prettyprint"><code>N CLS CNT 1 A 2 2 A 2 </code></pre>

The easiest way to think about this - leaving the <code>ORDER BY</code> out is equivalent to "ordering" in a way that all rows in the partition are "equal" to each other. Indeed, you can get the same effect by explicitly adding the <code>ORDER BY</code> clause like this: <code>ORDER BY 0</code> (or "order by" any constant expression), or even, more emphatically, <code>ORDER BY NULL</code>. Why you get the <code>COUNT()</code> or <code>SUM()</code> etc. for the entire partition has to do with the default windowing clause: <code>RANGE between unbounded preceding and current row</code>. "Range" (as opposed to "ROWS") means all rows "tied" with the current row are also included, even if they don't precede it. Since all rows are tied, this means the entire partition is included, no matter which row is "current."

Analytic count over partition with and without ORDER BY clause

Tags:

sql

oracle

window-functions

I don't understand why there are different results when using an ORDER BY clause in an analytic COUNT function.

Using a simple example:

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls) as cnt from req;

gives the the following result:

N   CLS CNT
2   A   2
1   A   2

Whereas, when adding an ORDER BY in the analytic clause, the result is different!

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n) as cnt from req;

CNT column changed:

N   CLS CNT
1   A   1
2   A   2

Can someone explain please?

Thanks

376

asked Dec 28 '16 15:12

Carmellose

2 Answers

First, a link to docs. It's somewhat obscure, however.

Analytic clause consists of query_partition_clause, order_by_clause and windowing_clause. And, a really important thing about windowing_clause is

You cannot specify this clause unless you have specified the order_by_clause. Some window boundaries defined by the RANGE clause let you specify only one expression in the order_by_clause. Refer to "Restrictions on the ORDER BY Clause".

But not only can you not use windowing_clause without the order_by_clause, they are tied together.

If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

The default windowing clause produces something like running total. COUNT returns 1 for first row, as there is only one row between the top of the window and the current row, 2 for the second row and so on.

So in your first query there is no windowing at all, but there is the default windowing in the second one.

And you can simulate the behavior of the first query by specifying fully unbounded window.

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as cnt from req;

Yep

N   CLS CNT
1   A   2
2   A   2

117

answered Dec 21 '22 22:12

Paul

The easiest way to think about this - leaving the ORDER BY out is equivalent to "ordering" in a way that all rows in the partition are "equal" to each other. Indeed, you can get the same effect by explicitly adding the ORDER BY clause like this: ORDER BY 0 (or "order by" any constant expression), or even, more emphatically, ORDER BY NULL.

Why you get the COUNT() or SUM() etc. for the entire partition has to do with the default windowing clause: RANGE between unbounded preceding and current row. "Range" (as opposed to "ROWS") means all rows "tied" with the current row are also included, even if they don't precede it. Since all rows are tied, this means the entire partition is included, no matter which row is "current."

answered Dec 21 '22 22:12

mathguy

Related questions
                            
                                Counting how many times a boolean value changes in SQL Server
                            
                                sql select group by a having count(1) > 1 equivalent in python pandas?
                            
                                How do I use transaction with oracle SQL?
                            
                                Getting the First and Last Row Using ROW_NUMBER and PARTITION BY
                            
                                sp_send_dbmail embed mhtml file in body
                            
                                Calculate loads and avoiding cursors
                            
                                Using django how can I combine two queries from separate models into one query?
                            
                                Difference between creating Guid keys in C# vs. the DB
                            
                                SQL - When would an empty OVER clause be used?
                            
                                Data Modeling: Is it always necessary to use an intersection table?
                            
                                is it a good idea to handle deadlock retry from stored procedure catch block
                            
                                Oracle: Using a database link in a stored procedure : table or view does not exist
                            
                                MySQL: Limiting number of results received based on a column value | Combining queries
                            
                                sql select records having count > 1 where at lease one record has value
                            
                                Restore deleted records in PostgreSQL
                            
                                SQLite: Insert current timestamp with milliseconds precision
                            
                                INSERT in a ONE to ONE Relationship
                            
                                Calculate Price For Overlapping Date Range
                            
                                SQL Server 2012 ISDATE() [duplicate]
                            
                                SELECT DISTINCT HAVING Count unique conditions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With