Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Analytic count over partition with and without ORDER BY clause

I don't understand why there are different results when using an ORDER BY clause in an analytic COUNT function.

Using a simple example:

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls) as cnt from req;

gives the the following result:

N   CLS CNT
2   A   2
1   A   2

Whereas, when adding an ORDER BY in the analytic clause, the result is different!

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n) as cnt from req;

CNT column changed:

N   CLS CNT
1   A   1
2   A   2

Can someone explain please?

Thanks

like image 376
Carmellose Avatar asked Dec 28 '16 15:12

Carmellose


People also ask

Can we use count with partition by?

SQL Count with Partition By clause is one of the new powerful syntax that t-sql developers can easily use. For example, while selecting a list of rows you can also calculate count of rows sharing the same field values without using subselects or SQL CTE (Common Table Expressions) in your query.

What is the difference between order by and partition by?

Partition By: This divides the rows or query result set into small partitions. Order By: This arranges the rows in ascending or descending order for the partition window. The default order is ascending. Row or Range: You can further limit the rows in a partition by specifying the start and endpoints.

What is the use of partition by clause in analytic function?

Optionally specified in an analytic function's OVER clause, a partition ( PARTITION BY ) clause groups input rows before the function processes them. Window partitioning is similar to an aggregate function's GROUP BY clause, except it returns exactly one result row per input row.

Which clause in an analytic function is dependent on the order by clause?

The functions SUM, COUNT, AVG, MIN, MAX are the common analytic functions the result of which does not depend on the order of the records. Functions like LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE depends on order of records.


2 Answers

First, a link to docs. It's somewhat obscure, however.

Analytic clause consists of query_partition_clause, order_by_clause and windowing_clause. And, a really important thing about windowing_clause is

You cannot specify this clause unless you have specified the order_by_clause. Some window boundaries defined by the RANGE clause let you specify only one expression in the order_by_clause. Refer to "Restrictions on the ORDER BY Clause".

But not only can you not use windowing_clause without the order_by_clause, they are tied together.

If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

The default windowing clause produces something like running total. COUNT returns 1 for first row, as there is only one row between the top of the window and the current row, 2 for the second row and so on.

So in your first query there is no windowing at all, but there is the default windowing in the second one.

And you can simulate the behavior of the first query by specifying fully unbounded window.

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as cnt from req;

Yep

N   CLS CNT
1   A   2
2   A   2
like image 117
Paul Avatar answered Dec 21 '22 22:12

Paul


The easiest way to think about this - leaving the ORDER BY out is equivalent to "ordering" in a way that all rows in the partition are "equal" to each other. Indeed, you can get the same effect by explicitly adding the ORDER BY clause like this: ORDER BY 0 (or "order by" any constant expression), or even, more emphatically, ORDER BY NULL.

Why you get the COUNT() or SUM() etc. for the entire partition has to do with the default windowing clause: RANGE between unbounded preceding and current row. "Range" (as opposed to "ROWS") means all rows "tied" with the current row are also included, even if they don't precede it. Since all rows are tied, this means the entire partition is included, no matter which row is "current."

like image 27
mathguy Avatar answered Dec 21 '22 22:12

mathguy