Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I query per hour increment of a accumulation column in clickhouse?

Tags:

clickhouse

I want to save event time and the total amount of generated electric per 30 seconds. The total amount is not reseted to zero everytime. It's just the total from the meter first started to now, not total amount generated in the 30 seconds.

Is there any way of querying daily, weekly or monthly aggregations on the total amount of generated electric column (Maybe not just sum or avg)?

Or by design a AggregatingMergeTree table?

I don't need to keep every record, just need the daily, weekly and monthly aggregations.

For example :

create table meter_record (
   event_time Datetime,
   generated_total Int64
)
like image 838
xiemeilong Avatar asked Dec 13 '19 01:12

xiemeilong


Video Answer


1 Answers

UPDATE

Prefer to use SimpleAggregateFunction instead of AggregateFunction for simple functions like median, avg, min, max to speed up aggregates calculation.


Let's suggest you need to calculate median, average and dispersion aggregations for this table:

CREATE TABLE meter_record (
   event_time Datetime,
   generated_total Int64  
)
ENGINE = MergeTree
PARTITION BY (toYYYYMM(event_time))
ORDER BY (event_time);

Use AggregatingMergeTree to calculate required aggregates:

CREATE MATERIALIZED VIEW meter_aggregates_mv
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(day)
ORDER BY (day) 
AS
SELECT  
  toDate(toStartOfDay(event_time)) AS day, 
  /* aggregates to calculate the day's section left and right endpoints */
  minState(generated_total) min_generated_total,
  maxState(generated_total) max_generated_total,
  /* specific aggregates */
  medianState(generated_total) AS totalMedian,
  avgState(generated_total) AS totalAvg,
  varPopState(generated_total) AS totalDispersion
  /* ... */
FROM meter_record
GROUP BY day;

To get required daily / weekly / montly (and any day-base aggregation like quarterly or yearly) aggregates use these queries:

/* daily report */
SELECT 
  day,
  minMerge(min_generated_total) min_generated_total,
  maxMerge(max_generated_total) max_generated_total,
  medianMerge(totalMedian) AS totalMedian,
  avgMerge(totalAvg) AS totalAvg,
  varPopMerge(totalDispersion) AS totalDispersion
FROM meter_aggregates_mv
/*WHERE day >= '2019-02-05' and day < '2019-07-01'*/
GROUP BY day;

/* weekly report */
SELECT 
  toStartOfWeek(day, 1) monday,
  minMerge(min_generated_total) min_generated_total,
  maxMerge(max_generated_total) max_generated_total,
  medianMerge(totalMedian) AS totalMedian,
  avgMerge(totalAvg) AS totalAvg,
  varPopMerge(totalDispersion) AS totalDispersion
FROM meter_aggregates_mv
/*WHERE day >= '2019-02-05' and day < '2019-07-01'*/
GROUP BY monday;

/* monthly report */
SELECT 
  toStartOfMonth(day) month,
  minMerge(min_generated_total) min_generated_total,
  maxMerge(max_generated_total) max_generated_total,
  medianMerge(totalMedian) AS totalMedian,
  avgMerge(totalAvg) AS totalAvg,
  varPopMerge(totalDispersion) AS totalDispersion
FROM meter_aggregates_mv
/*WHERE day >= '2019-02-05' and day < '2019-07-01'*/
GROUP BY month;

/* get daily / weekly / monthly reports in one query (thanks @Denis Zhuravlev for advise) */
SELECT
  day,
  toStartOfWeek(day, 1) AS week,
  toStartOfMonth(day) AS month,
  minMerge(min_generated_total) min_generated_total,
  maxMerge(max_generated_total) max_generated_total,
  medianMerge(totalMedian) AS totalMedian,
  avgMerge(totalAvg) AS totalAvg,
  varPopMerge(totalDispersion) AS totalDispersion
FROM meter_aggregates_mv
/*WHERE (day >= '2019-05-01') AND (day < '2019-06-01')*/
GROUP BY month, week, day WITH ROLLUP
ORDER BY day, week, month;

Remarks:

  • you point that raw-data is not required to you just aggregates, so you can set engine for meter_record-table as Null, manually clean meter_record (see DROP PARTITION) or define the TTL to do it automatically

  • removing raw-data is bad practice because it makes impossible to calculate new aggregates on historical data or restore exist aggregates etc

  • the materialized view meter_aggregates_mv will contains only the data inserted in the table meter_record after creating the view. To change this behavior use POPULATE in view definition

like image 77
vladimir Avatar answered Sep 26 '22 20:09

vladimir