Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a ring data structure in window functions

I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring:

enter image description here

I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5.

Here is an example table of two rings with different numbers of parts (always more than three per ring):

create table rp (ring int, part int);
insert into rp(ring, part) values(1, generate_series(1, 5));
insert into rp(ring, part) values(2, generate_series(1, 7));

Here is a nearly successful query:

SELECT ring, part, array[
    lag(part, 1, NULL) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

 ring | part | neighbours
------+------+------------
    1 |    1 | {NULL,1,2}
    1 |    2 | {1,2,3}
    1 |    3 | {2,3,4}
    1 |    4 | {3,4,5}
    1 |    5 | {4,5,1}
    2 |    1 | {NULL,1,2}
    2 |    2 | {1,2,3}
    2 |    3 | {2,3,4}
    2 |    4 | {3,4,5}
    2 |    5 | {4,5,6}
    2 |    6 | {5,6,7}
    2 |    7 | {6,7,1}
(12 rows)

The only thing I need to do is to replace the NULL with the ending point of each ring, which is the last value. Now, along with lag and lead window functions, there is a last_value function which would be ideal. However, these cannot be nested:

SELECT ring, part, array[
    lag(part, 1, last_value(part) over (partition by ring)) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;
ERROR:  window function calls cannot be nested
LINE 2:     lag(part, 1, last_value(part) over (partition by ring)) ...

Update. Thanks to @Justin's suggestion to use coalesce to avoid nesting window functions. Furthermore, it has been pointed out by numerous folks that first/last values need an explicit order by on the ring sequence, which happens to be part for this example. So randomising the input data a bit:

create table rp (ring int, part int);
insert into rp(ring, part) select 1, generate_series(1, 5) order by random();
insert into rp(ring, part) select 2, generate_series(1, 7) order by random();
like image 834
Mike T Avatar asked Jul 30 '14 06:07

Mike T


People also ask

How does window Partitionby work?

PARTITION BY – PARTITION BY resets its counter every time a given column changes values. ORDER BY – ORDER BY orders the rows (in the window only) the function evaluates. ROWS BETWEEN – ROWS BETWEEN enables you to further limit the rows in the window.

Why is windowing function used?

Window functions increase the efficiency and reduce the complexity of queries that analyze partitions (windows) of a data set by providing an alternative to more complex SQL concepts, e.g. derived queries. Common use cases include: Ranking results within a specific window (e.g. per-group ranking)

Can window functions be used in where?

You can't use window functions in WHERE , because the logical order of operations in an SQL query is completely different from the SQL syntax.


2 Answers

  • Use COALESCE like @Justin provided.
  • With first_value() / last_value() you need to add an ORDER BY clause to the window definition or the order is undefined. You just got lucky in the example, because the rows happen to be in order right after creating the dummy table.
    Once you add ORDER BY, the default window frame ends at the current row, and you need to special case the last_value() call - or revert the sort order in the window frame like demonstrated in my first example.

  • When reusing a window definition multiple times, an explicit WINDOW clause simplifies syntax a lot:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part);

Better yet, reuse the same window definition, so Postgres can calculate all values in a single scan. For this to work we need to define a custom window frame:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER w)
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring
             ORDER BY part
             RANGE BETWEEN UNBOUNDED PRECEDING
                       AND UNBOUNDED FOLLOWING)
ORDER  BY 1,2;

You can even adapt the frame definition for each window function call:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
                                                AND UNBOUNDED FOLLOWING))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER  BY 1,2;

Might be faster for rings with many parts. You'll have to test.

SQL Fiddle demonstrating all three with an improved test case. Consider query plans.

More about window frame definitions:

  • In the manual.
  • PostgreSQL window function: partition by comparison
  • PostgreSQL query with max and min date plus associated id per row
like image 150
Erwin Brandstetter Avatar answered Oct 09 '22 01:10

Erwin Brandstetter


Query:

SQLFIDDLEExample

SELECT ring, part, array[
    coalesce(lag(part, 1, NULL) over (partition by ring), 
             max(part) over (partition by ring)),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

Result:

| RING | PART | NEIGHBOURS |
|------|------|------------|
|    1 |    1 |      5,1,2 |
|    1 |    2 |      1,2,3 |
|    1 |    3 |      2,3,4 |
|    1 |    4 |      3,4,5 |
|    1 |    5 |      4,5,1 |
|    2 |    1 |      7,1,2 |
|    2 |    2 |      1,2,3 |
|    2 |    3 |      2,3,4 |
|    2 |    4 |      3,4,5 |
|    2 |    5 |      4,5,6 |
|    2 |    6 |      5,6,7 |
|    2 |    7 |      6,7,1 |
like image 37
Justin Avatar answered Oct 09 '22 02:10

Justin