Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group by time bucket in ClickHouse and fill missing data with nulls/0s

Tags:

sql

clickhouse

Suppose I have a given time range. For explanation, let's consider something simple, like whole year 2018. I want to query data from ClickHouse as a sum aggregation for each quarter so the result should be 4 rows.

The problem is that I have data for only two quarters so when using GROUP BY quarter, only two rows are returned.

SELECT
     toStartOfQuarter(created_at) AS time,
     sum(metric) metric
 FROM mytable
 WHERE
     created_at >= toDate(1514761200) AND created_at >= toDateTime(1514761200)
    AND
     created_at <= toDate(1546210800) AND created_at <= toDateTime(1546210800)
 GROUP BY time
 ORDER BY time

15147612002018-01-01
15462108002018-12-31

This returns:

time       metric
2018-01-01 345
2018-04-01 123

And I need:

time       metric
2018-01-01 345
2018-04-01 123
2018-07-01 0
2018-10-01 0

This is simplified example but in real use case the aggregation would be eg. 5 minutes instead of quarters and GROUP BY would have at least one more attribute like GROUP BY attribute1, time so desired result is

time        metric  attribute1
2018-01-01  345     1
2018-01-01  345     2
2018-04-01  123     1
2018-04-01  123     2
2018-07-01  0       1
2018-07-01  0       2
2018-10-01  0       1
2018-10-01  0       2

Is there a way to somehow fill the whole given interval? Like InfluxDB has fill argument for group or TimescaleDb's time_bucket() function with generate_series() I tried to search ClickHouse documentation and github issues and it seems this is not implemented yet so the question perhaps is whether there's any workaround.

like image 991
simPod Avatar asked May 08 '18 16:05

simPod


1 Answers

From ClickHouse 19.14 you can use the WITH FILL clause. It can fill quarters in this way:

WITH
    (
        SELECT toRelativeQuarterNum(toDate('1970-01-01'))
    ) AS init
SELECT
    -- build the date from the relative quarter number
    toDate('1970-01-01') + toIntervalQuarter(q - init) AS time,
    metric
FROM
(
    SELECT
        toRelativeQuarterNum(created_at) AS q,
        sum(rand()) AS metric
    FROM
    (
        -- generate some dates and metrics values with gaps
        SELECT toDate(arrayJoin(range(1514761200, 1546210800, ((60 * 60) * 24) * 180))) AS created_at
    )
    GROUP BY q
    ORDER BY q ASC WITH FILL FROM toRelativeQuarterNum(toDate(1514761200)) TO toRelativeQuarterNum(toDate(1546210800)) STEP 1
)

┌───────time─┬─────metric─┐
│ 2018-01-01 │ 2950782089 │
│ 2018-04-01 │ 2972073797 │
│ 2018-07-01 │          0 │
│ 2018-10-01 │  179581958 │
└────────────┴────────────┘
like image 122
alrocar Avatar answered Sep 18 '22 12:09

alrocar