Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clickhouse: Sliding / moving window

I'm looking for an efficient way on how to query the n past values as array in ClickHouse for each row ordered by one column (i.e. Time), where the values should be retrieved as array.

Window functions are still not supported in ClickHouse (see #1469), so I was hoping for a work-around using aggregation functions like groupArray()?

Table:

Time  | Value
12:11 | 1
12:12 | 2
12:13 | 3
12:14 | 4
12:15 | 5
12:16 | 6

Expected result with a window of size n=3:

Time  | Value
12:13 | [1,2,3]
12:14 | [2,3,4]
12:15 | [3,4,5]
12:16 | [4,5,6]

What are the ways/functions currently used in ClickHouse to efficiently query a sliding/moving window and how can I achieve my desired result?

EDIT:

My solution based on response of @vladimir:

select max(Time) as Time, groupArray(Value) as Values
from (
    select
        *,
        rowNumberInAllBlocks() as row_number,
        arrayJoin(range(row_number, row_number + 3)) as window_id
    from (
        /* BEGIN emulate origin dataset */
        select toDateTime(a) as Time, rowNumberInAllBlocks()+1 as Value
        from (
            select arrayJoin([
                '2020-01-01 12:11:00',
                '2020-01-01 12:12:00',
                '2020-01-01 12:13:00',
                '2020-01-01 12:14:00',
                '2020-01-01 12:15:00',
                '2020-01-01 12:16:00']) a
        )
        order by Time
        /* END emulate origin dataset */
    )
    order by Time
) s
group by window_id
having length(Values) = 3
order by Time

Note that 3 appears twice in the query and represents the window size n.

Output:

┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘
like image 778
1RB Avatar asked Feb 02 '26 10:02

1RB


1 Answers

Starting from version 21.4 added the full support of window-functions. At this moment it was marked as an experimental feature.

SELECT
    Time,
    groupArray(any(Value)) OVER (ORDER BY Time ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS Values
FROM 
(
    /* Emulate the test dataset, */
    select toDateTime(a) as Time, rowNumberInAllBlocks()+1 as Value
    from (
        select arrayJoin([
            '2020-01-01 12:11:00',
            '2020-01-01 12:12:00',
            '2020-01-01 12:13:00',
            '2020-01-01 12:14:00',
            '2020-01-01 12:15:00',
            '2020-01-01 12:16:00']) a
    )
    order by Time
)
GROUP BY Time
SETTINGS allow_experimental_window_functions = 1

/*
┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:11:00 │ [1]     │
│ 2020-01-01 12:12:00 │ [1,2]   │
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘
*/

See https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art.


ClickHouse has several datablock-scoped window functions, let's take neighbor:

SELECT Time, [neighbor(Value, -2), neighbor(Value, -1), neighbor(Value, 0)] Values
FROM (
  /* emulate origin data */
  SELECT toDateTime(data.1) as Time, data.2 as Value
  FROM (
    SELECT arrayJoin([('2020-01-01 12:11:00', 1),
    ('2020-01-01 12:12:00', 2),
    ('2020-01-01 12:13:00', 3),
    ('2020-01-01 12:14:00', 4),
    ('2020-01-01 12:15:00', 5),
    ('2020-01-01 12:16:00', 6)]) as data)
  )

/*
┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:11:00 │ [0,0,1] │
│ 2020-01-01 12:12:00 │ [0,1,2] │
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘

*/

An alternate way based on the duplication of source rows by window_size times:

SELECT   
  arrayReduce('max', arrayMap(x -> x.1, raw_result)) Time,
  arrayMap(x -> x.2, raw_result) Values
FROM (  
  SELECT groupArray((Time, Value)) raw_result, max(row_number) max_row_number
  FROM (
    SELECT 
      3 AS window_size,
      *, 
      rowNumberInAllBlocks() row_number,
      arrayJoin(arrayMap(x -> x + row_number, range(window_size))) window_id
    FROM (
      /* emulate origin dataset */
      SELECT toDateTime(data.1) as Time, data.2 as Value
      FROM (
        SELECT arrayJoin([('2020-01-01 12:11:00', 1),
          ('2020-01-01 12:12:00', 2),
          ('2020-01-01 12:13:00', 3),
          ('2020-01-01 12:14:00', 4),
          ('2020-01-01 12:15:00', 5),
          ('2020-01-01 12:16:00', 6)]) as data)
      ORDER BY Value
      )
    )
  GROUP BY window_id
  HAVING max_row_number = window_id
  ORDER BY window_id
  )
/*
┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:11:00 │ [1]     │
│ 2020-01-01 12:12:00 │ [1,2]   │
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘
*/

Extra example:

SELECT   
  arrayReduce('max', arrayMap(x -> x.1, raw_result)) id,
  arrayMap(x -> x.2, raw_result) values
FROM (  
  SELECT groupArray((id, value)) raw_result, max(row_number) max_row_number
  FROM (
    SELECT 
      48 AS window_size,
      *, 
      rowNumberInAllBlocks() row_number,
      arrayJoin(arrayMap(x -> x + row_number, range(window_size))) window_id
    FROM (
      /* the origin dataset */
      SELECT number AS id, number AS value
      FROM numbers(4096) 
      )
    )
  GROUP BY window_id
  HAVING max_row_number = window_id
  ORDER BY window_id
  )
/*
┌─id─┬─values────────────────┐
│  0 │ [0]                   │
│  1 │ [0,1]                 │
│  2 │ [0,1,2]               │
│  3 │ [0,1,2,3]             │
│  4 │ [0,1,2,3,4]           │
│  5 │ [0,1,2,3,4,5]         │
│  6 │ [0,1,2,3,4,5,6]       │
│  7 │ [0,1,2,3,4,5,6,7]     │
..
│ 56 │ [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56]  │
│ 57 │ [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57] │
│ 58 │ [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58] │
│ 59 │ [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59] │
│ 60 │ [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60] │
..
│ 4093 │ [4046,4047,4048,4049,4050,4051,4052,4053,4054,4055,4056,4057,4058,4059,4060,4061,4062,4063,4064,4065,4066,4067,4068,4069,4070,4071,4072,4073,4074,4075,4076,4077,4078,4079,4080,4081,4082,4083,4084,4085,4086,4087,4088,4089,4090,4091,4092,4093] │
│ 4094 │ [4047,4048,4049,4050,4051,4052,4053,4054,4055,4056,4057,4058,4059,4060,4061,4062,4063,4064,4065,4066,4067,4068,4069,4070,4071,4072,4073,4074,4075,4076,4077,4078,4079,4080,4081,4082,4083,4084,4085,4086,4087,4088,4089,4090,4091,4092,4093,4094] │
│ 4095 │ [4048,4049,4050,4051,4052,4053,4054,4055,4056,4057,4058,4059,4060,4061,4062,4063,4064,4065,4066,4067,4068,4069,4070,4071,4072,4073,4074,4075,4076,4077,4078,4079,4080,4081,4082,4083,4084,4085,4086,4087,4088,4089,4090,4091,4092,4093,4094,4095] │
└──────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
*/
like image 129
vladimir Avatar answered Feb 03 '26 23:02

vladimir