I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring:
I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5.
Here is an example table of two rings with different numbers of parts (always more than three per ring):
create table rp (ring int, part int);
insert into rp(ring, part) values(1, generate_series(1, 5));
insert into rp(ring, part) values(2, generate_series(1, 7));
Here is a nearly successful query:
SELECT ring, part, array[
lag(part, 1, NULL) over (partition by ring),
part,
lead(part, 1, 1) over (partition by ring)
] AS neighbours
FROM rp;
ring | part | neighbours
------+------+------------
1 | 1 | {NULL,1,2}
1 | 2 | {1,2,3}
1 | 3 | {2,3,4}
1 | 4 | {3,4,5}
1 | 5 | {4,5,1}
2 | 1 | {NULL,1,2}
2 | 2 | {1,2,3}
2 | 3 | {2,3,4}
2 | 4 | {3,4,5}
2 | 5 | {4,5,6}
2 | 6 | {5,6,7}
2 | 7 | {6,7,1}
(12 rows)
The only thing I need to do is to replace the NULL
with the ending point of each ring, which is the last value. Now, along with lag
and lead
window functions, there is a last_value
function which would be ideal. However, these cannot be nested:
SELECT ring, part, array[
lag(part, 1, last_value(part) over (partition by ring)) over (partition by ring),
part,
lead(part, 1, 1) over (partition by ring)
] AS neighbours
FROM rp;
ERROR: window function calls cannot be nested
LINE 2: lag(part, 1, last_value(part) over (partition by ring)) ...
Update. Thanks to @Justin's suggestion to use coalesce
to avoid nesting window functions. Furthermore, it has been pointed out by numerous folks that first/last values need an explicit order by
on the ring sequence, which happens to be part
for this example. So randomising the input data a bit:
create table rp (ring int, part int);
insert into rp(ring, part) select 1, generate_series(1, 5) order by random();
insert into rp(ring, part) select 2, generate_series(1, 7) order by random();
PARTITION BY – PARTITION BY resets its counter every time a given column changes values. ORDER BY – ORDER BY orders the rows (in the window only) the function evaluates. ROWS BETWEEN – ROWS BETWEEN enables you to further limit the rows in the window.
Window functions increase the efficiency and reduce the complexity of queries that analyze partitions (windows) of a data set by providing an alternative to more complex SQL concepts, e.g. derived queries. Common use cases include: Ranking results within a specific window (e.g. per-group ranking)
You can't use window functions in WHERE , because the logical order of operations in an SQL query is completely different from the SQL syntax.
COALESCE
like @Justin provided.With first_value()
/ last_value()
you need to add an ORDER BY
clause to the window definition or the order is undefined. You just got lucky in the example, because the rows happen to be in order right after creating the dummy table.
Once you add ORDER BY
, the default window frame ends at the current row, and you need to special case the last_value()
call - or revert the sort order in the window frame like demonstrated in my first example.
When reusing a window definition multiple times, an explicit WINDOW
clause simplifies syntax a lot:
SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring ORDER BY part);
Better yet, reuse the same window definition, so Postgres can calculate all values in a single scan. For this to work we need to define a custom window frame:
SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,last_value(part) OVER w)
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring
ORDER BY part
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
ORDER BY 1,2;
You can even adapt the frame definition for each window function call:
SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING))
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER BY 1,2;
Might be faster for rings with many parts. You'll have to test.
SQL Fiddle demonstrating all three with an improved test case. Consider query plans.
More about window frame definitions:
Query:
SQLFIDDLEExample
SELECT ring, part, array[
coalesce(lag(part, 1, NULL) over (partition by ring),
max(part) over (partition by ring)),
part,
lead(part, 1, 1) over (partition by ring)
] AS neighbours
FROM rp;
Result:
| RING | PART | NEIGHBOURS |
|------|------|------------|
| 1 | 1 | 5,1,2 |
| 1 | 2 | 1,2,3 |
| 1 | 3 | 2,3,4 |
| 1 | 4 | 3,4,5 |
| 1 | 5 | 4,5,1 |
| 2 | 1 | 7,1,2 |
| 2 | 2 | 1,2,3 |
| 2 | 3 | 2,3,4 |
| 2 | 4 | 3,4,5 |
| 2 | 5 | 4,5,6 |
| 2 | 6 | 5,6,7 |
| 2 | 7 | 6,7,1 |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With