Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partitioning function for continuous sequences

There is a table of the following structure:

CREATE TABLE history
(
  pk serial NOT NULL,
  "from" integer NOT NULL,
  "to" integer NOT NULL,
  entity_key text NOT NULL,
  data text NOT NULL,
  CONSTRAINT history_pkey PRIMARY KEY (pk)
);

The pk is a primary key, from and to define a position in the sequence and the sequence itself for a given entity identified by entity_key. So the entity has one sequence of 2 rows in case if the first row has the from = 1; to = 2 and the second one has from = 2; to = 3. So the point here is that the to of the previous row matches the from of the next one.

The order to determine "next"/"previous" row is defined by pk which grows monotonously (since it's a SERIAL).

The sequence does not have to start with 1 and the to - from does not necessary 1 always. So it can be from = 1; to = 10. What matters is that the "next" row in the sequence matches the to exactly.

Sample dataset:

pk  |  from  |  to  |  entity_key  |  data
----+--------+------+--------------+-------
1   |   1    |   2  |      42      |  foo
2   |   2    |   3  |      42      |  bar
3   |   3    |   4  |      42      |  baz
4   |  10    |  11  |      42      |  another foo
5   |  11    |  12  |      42      |  another baz
6   |   1    |   2  |     111      |  one one one
7   |   2    |   3  |     111      |  one one one two
8   |   3    |   4  |     111      |  one one one three

And what I cannot realize is how to partition by "sequences" here so that I could apply window functions to the group that represents a single "sequence".

Let's say I want to use the row_number() function and would like to get the following result:

pk  |  row_number | entity_key
----+-------------+------------
1   |     1       |       42
2   |     2       |       42
3   |     3       |       42
4   |     1       |       42
5   |     2       |       42
6   |     1       |      111
7   |     2       |      111
8   |     3       |      111

For convenience I created an SQLFiddle with initial seed: http://sqlfiddle.com/#!15/e7c1c

PS: It's not the "give me the codez" question, I made my own research and I just out of ideas how to partition.

It's obvious that I need to LEFT JOIN with the next.from = curr.to, but then it's still not clear how to reset the partition on next.from IS NULL.

PS: It will be a 100 points bounty for the most elegant query that provides the requested result

PPS: the desired solution should be an SQL query not pgsql due to some other limitations that are out of scope of this question.

like image 900
zerkms Avatar asked Dec 19 '22 00:12

zerkms


2 Answers

I don’t know if it counts as “elegant,” but I think this will do what you want:

with Lagged as (
  select
    pk,
    case when lag("to",1) over (order by pk) is distinct from "from" then 1 else 0 end as starts,
    entity_key
  from history
), LaggedGroups as (
  select
    pk,
    sum(starts) over (order by pk) as groups,
    entity_key
  from Lagged
)
  select
    pk,
    row_number() over (
      partition by groups
      order by pk
    ) as "row_number",
    entity_key
from LaggedGroups
like image 69
Steve Kass Avatar answered Dec 28 '22 23:12

Steve Kass


Just for fun & completeness: a recursive solution to reconstruct the (doubly) linked lists of records. [ this will not be the fastest solution ]

NOTE: I commented out the ascending pk condition(s) since they are not needed for the connection logic.

WITH RECURSIVE zzz AS (
        SELECT h0.pk
        , h0."to" AS next
        , h0.entity_key AS ek
        , 1::integer AS rnk
        FROM history h0
        WHERE NOT EXISTS (
                SELECT * FROM history nx
                WHERE nx.entity_key = h0.entity_key
                AND nx."to" = h0."from"
                -- AND nx.pk > h0.pk
                )
        UNION ALL
        SELECT h1.pk
        , h1."to" AS next
        , h1.entity_key AS ek
        , 1+zzz.rnk AS rnk
        FROM zzz
        JOIN history h1
          ON h1.entity_key = zzz.ek
                AND h1."from" = zzz.next
                -- AND h1.pk > zzz.pk
        )
SELECT * FROM zzz
ORDER BY ek,pk
        ;
like image 28
wildplasser Avatar answered Dec 28 '22 22:12

wildplasser