Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get row number by frame from rolling window

Tags:

sql

duckdb

I would like to get the row number by frame (not partition) from a rolling window in SQL/duckdb.

With this data

customer_id,date
ca,2024-04-03
ca,2024-04-04
ca,2024-04-04
ca,2024-04-11
cb,2024-04-02
cb,2024-04-02
cb,2024-04-03
cb,2024-05-13

and this query

SELECT
    customer_id,
    date,
    row_number() OVER win AS row_by_partition
FROM 'example.csv'
WINDOW win AS (
    PARTITION BY customer_id
    ORDER BY date ASC
    RANGE BETWEEN CURRENT ROW
      AND INTERVAL 1 WEEK FOLLOWING)

I get a row number by partition

┌─────────────┬────────────┬──────────────────┐
│ customer_id │    date    │ row_by_partition │
│   varchar   │    date    │      int64       │
├─────────────┼────────────┼──────────────────┤
│ ca          │ 2024-04-03 │                1 │
│ ca          │ 2024-04-04 │                2 │
│ ca          │ 2024-04-04 │                3 │
│ ca          │ 2024-04-11 │                4 │
│ cb          │ 2024-04-02 │                1 │
│ cb          │ 2024-04-02 │                2 │
│ cb          │ 2024-04-03 │                3 │
│ cb          │ 2024-05-13 │                4 │
└─────────────┴────────────┴──────────────────┘

However, I'd like to get the row number by frame

┌─────────────┬────────────┬──────────────┐
│ customer_id │    date    │ row_by_frame │
│   varchar   │    date    │      int64   │
├─────────────┼────────────┼──────────────┤
│ ca          │ 2024-04-03 │            1 │
│ ca          │ 2024-04-04 │            1 │
│ ca          │ 2024-04-04 │            2 │
│ ca          │ 2024-04-11 │            1 │
│ cb          │ 2024-04-02 │            1 │
│ cb          │ 2024-04-02 │            2 │
│ cb          │ 2024-04-03 │            1 │
│ cb          │ 2024-05-13 │            1 │
└─────────────┴────────────┴──────────────┘
like image 251
dpprdan Avatar asked Jan 26 '26 21:01

dpprdan


1 Answers

This is new functionality originally proposed in Efficient Evaluation of Arbitrarily-Framed Holistic SQL Aggregates and Window Functions (see §2.4). DuckDB's latest builds (and the forthcoming v1.2) implement this proposal, but you will need to provide an ordering clause inside the row_number function to trigger frame awareness:

SELECT
    customer_id,
    date,
    row_number(ORDER BY date ASC) OVER win AS row_by_partition
FROM 'example.csv'
WINDOW win AS (
    PARTITION BY customer_id
    ORDER BY date ASC
    RANGE BETWEEN CURRENT ROW
      AND INTERVAL 1 WEEK FOLLOWING)

The duplication of the ORDER BY is required, and optimising away the second sort is future work.

like image 56
hawkfish Avatar answered Jan 28 '26 13:01

hawkfish