Window functions: PARTITION BY one column after ORDER BY another

Tags:

Disclaimer: The shown problem is much more general than I expected first. The example below is taken from a solution to another question. But now I was taking this sample for solving many problems more - mostly related to time series (have a look at the "Linked" section in the right bar).

So I am trying to explain the problem more generally first:

I am using PostgreSQL but I am sure this problem exists in other window function supporting DBMS' (MS SQL Server, Oracle, ...) as well.

Window functions can be used to group certain values together by a common attribute or value. For example you can group rows by a date. Then you are able to calculate the max value within every single date or an average value or counting rows or whatever.

This can be achieved by defining a PARTITION. Grouping by dates would work with PARTITION BY date_column. Now you want to do an operation which needs a special order within your groups (calculating row numbers or sum up a column). This can be done with PARTITON BY date_column ORDER BY an_attribute_column.

Now think about a finer resolution of time series. What if you do not have dates but timestamps. Then you cannot group by the time column anymore. But nevertheless it might be important to analyse the data in the order they were added (maybe the timestamp is the creating time of your data set). Then you realize that some consecutive rows have the same value and you want to group your data by this common value. But the clue is that the rows have different timestamps.

The problem here is that you cannot do a PARTITION BY value_column. Because PARTITION BY forces an ordering first. So your table would be ordered by the value_column before the grouping and is not ordered by the timestamp anymore. This yields in results you are not expecting.

More general speaking: The problem is to ensure a special ordering even if the ordered column is not part of the created partition.

Example:

db<>fiddle

I have the following table:

ts      val
100000  50
130100  30050
160100  60050 
190200  100
220200  30100 
250200  30100 
300000  300
500000  100
550000  1000  
600000  1000
650000  2000  
700000  2000
720000  2000
750000  300

I had the problem that I had to group all tied values of the column val. But I wanted to hold the order by ts. To achieve this I wanted to add a column with a unique ID per val group

Expected result:

ts      val     group
100000  50      1
130100  30050   2
160100  60050   3
190200  100     4
220200  30100   5     \ same group
250200  30100   5     /
300000  300     6
500000  100     7
550000  1000    8     \ same group
600000  1000    8     /
650000  2000    9     \
700000  2000    9     | same group
720000  2000    9     /
750000  300     10

First try was the use of the rank window function which would do this job normally:

SELECT 
    *,
    rank() OVER (PARTITION BY val ORDER BY ts)
FROM 
    test

But in this case this doesn't work because the PARTITION BY clause orders the table first by its partition columns (val in this case) and then by its ORDER BY columns. So the order is by val, ts instead of the expected order by ts. So the result was not the expected one of course.

ts       val     rank
100000   50      1
190200   100     1
500000   100     2
300000   300     1
750000   300     2
550000   1000    1
600000   1000    2
650000   2000    1
700000   2000    2
720000   2000    3
130100   30050   1
220200   30100   1
250200   30100   2
160100   60050   1

The question is: How to get the group ids with respect to the order by ts?

Edit: I added an own solution below but I feel very uncomfortable with it. It seems way too complicated. I was wondering if there's a better way to achieve this result.

946

asked Sep 13 '18 18:09

S-Man

1 Answers

I came up with this solution by myself (hoping someone else will get a better one):

demo:db<>fiddle

order by ts
give out the next val value with the lag window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
check if the next and the current values are the same. Then I can print out a 0 or a 1
sum up these values with an ordered SUM. This generates the groups I am looking for. They group the val column but ensure the ordering by the ts column.

The query:

SELECT 
    *, 
    SUM(is_diff) OVER (ORDER BY ts) 
FROM (
    SELECT 
        *,
        CASE WHEN val = lag(val) over (order by ts) THEN 0 ELSE 1 END as is_diff
    FROM test 
)s

The result:

ts       val     is_diff   sum
100000   50      1         1
130100   30050   1         2
160100   60050   1         3
190200   100     1         4
220200   30100   1         5    \ group
250200   30100   0         5    /
300000   300     1         6
500000   100     1         7
550000   1000    1         8    \ group
600000   1000    0         8    /
650000   2000    1         9    \
700000   2000    0         9    | group
720000   2000    0         9    /
750000   300     1         10

143

answered Sep 21 '22 12:09

S-Man

Related questions
                            
                                Get back newly inserted row in Postgres with sqlx
                            
                                SQL Server why is index not used with OR
                            
                                JOOQ & Firebird - Implementation Limit Exceeded
                            
                                How to wrap PL/SQL source code in Oracle?
                            
                                SQL INNER JOIN vs Where Exists Performance Consideration
                            
                                Oracle 12c documentation for changes / new features to SQL
                            
                                LINQ Query to check for a predicate in all columns in a table
                            
                                SQL syntax error using jdbc
                            
                                Directed graph in Oracle SQL using recursive query visiting each node only once
                            
                                How scn map to timestamp using sys.smon_scn_time in Oracle?
                            
                                How to insert nested json data in mysql table?
                            
                                How to represent GROUP BY with HAVING COUNT(*)>1 in relational algebra?
                            
                                How can I use Windows authentication in MVC but use the newer identity database tables for role storage?
                            
                                Postgres: why LEFT JOIN affects to query plan?
                            
                                "Coalesce" multiple columns
                            
                                Opening .scmp file in Visual Studio 2015 opens as XML and not as comparison UI
                            
                                Cannot use Tuple class with native queries
                            
                                Remove nulls from an array in SQL
                            
                                Multithreaded pyodbc connection
                            
                                Stored Procedure for dynamic data mapping

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Window functions: PARTITION BY one column after ORDER BY another

Tags:

sql

postgresql

window-functions

S-Man

People also ask

1 Answers

S-Man

Recent Activity

Donate For Us