Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL first order, then partition in over clause

I have a problem, that I want to partition over a sorted table. Is there a way I can do that?

I am using SQL Server 2016.

Input Table:

|---------|-----------------|-----------|------------|
|  prod   |   sortcolumn    |    type   |    value   |
|---------|-----------------|-----------|------------|
|    X    |        1        |     P     |     12     |
|    X    |        2        |     P     |     23     |
|    X    |        3        |     E     |     34     |
|    X    |        4        |     P     |     45     |
|    X    |        5        |     E     |     56     |
|    X    |        6        |     E     |     67     |
|    Y    |        1        |     P     |     78     |
|---------|-----------------|-----------|------------|

Desired Output

|---------|-----------------|-----------|------------|------------|
|  prod   |   sortcolumn    |    type   |    value   |    rowNr   |
|---------|-----------------|-----------|------------|------------|
|    X    |        1        |     P     |     12     |      1     |
|    X    |        2        |     P     |     23     |      2     |
|    X    |        3        |     E     |     34     |      1     |
|    X    |        4        |     P     |     45     |      1     |
|    X    |        5        |     E     |     56     |      1     |
|    X    |        6        |     E     |     67     |      2     |
|    Y    |        1        |     P     |     78     |      1     |
|---------|-----------------|-----------|------------|------------|

I am this far:

SELECT
  table.*,
  ROW_NUMBER() OVER(PARTITION BY table.prod, table.type ORDER BY table.sortColumn) rowNr
FROM table

But this does not restart the row number on the 4th row, since it is the same prod and type. How could I restart on every prod and also on every type change based on the sort criteria, even if the type changes back to something it already was previously? Is this even possible with a ROW_NUMBER function or do I have to work with LEAD and LAG and CASES (which would probably make it very slow, right?)

Thanks!

like image 299
Letimogo Avatar asked Jun 28 '18 08:06

Letimogo


People also ask

Does order matter in partition by SQL?

SQL Server always maintains partition ids in partition boundary order. No matter how you split or merge partitions, the order of boundary values always matches the order of partition ids. SQL Server renumbers as necessary.

Can we use WHERE clause with partition by?

bhushan - we cannot use a where clause after the partition by is over. so, your first example is perfect like Frank's.

Can we use ORDER BY before WHERE clause?

ORDER BY Characteristics: The ORDER BY clause is used to get the sorted records on one or more columns in ascending or descending order. The ORDER BY clause must come after the WHERE, GROUP BY, and HAVING clause if present in the query.

How over () function works in SQL?

Determines the partitioning and ordering of a rowset before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window.


2 Answers

This is a gaps and islands problem. You can use the following query:

SELECT t.*, 
       ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn)
       -
       ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) AS grp
FROM mytable t

to get:

prod    sortcolumn  type    value   grp
----------------------------------------
X       1           P       12      0
X       2           P       23      0
X       3           E       34      2
X       4           P       45      1
X       5           E       56      3
X       6           E       67      3
Y       1           P       78      0

Now, field grp can be used for partitioning:

;WITH IslandsCTE AS (
    SELECT t.*, 
           ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn)
           -
           ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) AS grp
    FROM mytable t  
)
SELECT prod, sortcolumn, type, value,
       ROW_NUMBER() OVER (PARTITION BY prod, type, grp ORDER BY sortcolumn) AS rowNr
FROM IslandsCTE
ORDER BY prod, sortcolumn

Demo here

like image 91
Giorgos Betsos Avatar answered Oct 13 '22 21:10

Giorgos Betsos


This is a classic 'islands' problem, in that you need to find the 'islands' of records related by prod and type, but without grouping together all records matching on prod and type.

Here's one way this is typically solved. Set up:

DECLARE @t TABLE (
    prod varchar(1),
    sortcolumn int,
    type varchar(1),
    value int
);

INSERT @t VALUES
('X', 1, 'P', 12),
('X', 2, 'P', 23),
('X', 3, 'E', 34),
('X', 4, 'P', 45),
('X', 5, 'E', 56),
('X', 6, 'E', 67),
('Y', 1, 'P', 78)
;

Get some row numbers in place:

;WITH numbered AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) as rnX,
        ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn) as rn
    FROM
        @t
)

numbered now looks like this:

prod sortcolumn  type value       rnX                  rn
---- ----------- ---- ----------- -------------------- --------------------
X    1           P    12          1                    1
X    2           P    23          2                    2
X    3           E    34          1                    3
X    4           P    45          3                    4
X    5           E    56          2                    5
X    6           E    67          3                    6
Y    1           P    78          1                    1

Why is this useful? Well, look at the difference between rnX and rn:

prod sortcolumn  type value       rnX                  rn                   rn - rnX
---- ----------- ---- ----------- -------------------- -------------------- --------------------
X    1           P    12          1                    1                    0
X    2           P    23          2                    2                    0
X    3           E    34          1                    3                    2
X    4           P    45          3                    4                    1
X    5           E    56          2                    5                    3
X    6           E    67          3                    6                    3
Y    1           P    78          1                    1                    0

As you can see, each 'group' shares a rn - rnX value, and this changes from one group to the next.

So now if we partition by prod, type, and group number, then number within that:

SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY prod, type, rn - rnX ORDER BY sortcolumn) rowNr
FROM
    numbered
ORDER BY 
    prod, sortcolumn

we're done:

prod sortcolumn  type value       rnX                  rn                   rowNr
---- ----------- ---- ----------- -------------------- -------------------- --------------------
X    1           P    12          1                    1                    1
X    2           P    23          2                    2                    2
X    3           E    34          1                    3                    1
X    4           P    45          3                    4                    1
X    5           E    56          2                    5                    1
X    6           E    67          3                    6                    2
Y    1           P    78          1                    1                    1

Related reading: Things SQL needs: SERIES()

like image 22
AakashM Avatar answered Oct 13 '22 19:10

AakashM