Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clickhouse as time-series storage

I just wonder if ClickHouse can be used for storing time-series data in the case like this: schema with columns: "some_entity_id", "timestamp", "metric1", "metric2", "metric3", ..., "metricN". Where each new column containing metric name can be added to the table dynamically, while adding entry with this metric name.

Have not found any information about dynamical table extend in official documentation.

So can this case be implemented in Clickhouse?

UPD: After some benchmarks we found out that ClickHouse writes new data faster than our current time-series storage, but reads data much more slower.

like image 322
Filipp Shestakov Avatar asked Feb 22 '17 12:02

Filipp Shestakov


People also ask

Is ClickHouse a time series database?

As you can see, ClickHouse is an amazing database with some powerful functions for time series. There is so much more to explore, we are only scratching the surface here. Now, it is up to you to use those time series functions on your own databases and solve new challenges!

What is ClickHouse not good for?

The sparse index makes ClickHouse not so efficient for point queries retrieving single rows by their keys.

What is the best way to store time series data?

Storing time series data. Time series data is best stored in a time series database (TSDB) built specifically for handling metrics and events that are time-stamped. This is because time series data is often ingested in massive volumes that require a purpose-built database designed to handle that scale.

What is ClickHouse good for?

ClickHouse is a highly scalable open source database management system (DBMS) that uses a column-oriented structure. It's designed for online analytical processing (OLAP) and is highly performant. ClickHouse can return processed results in real time in a fraction of a second.


1 Answers

There are more than one ways to use CH as a time series database. My personal preference is to use one string array for metric names and one Float64 array for metric values.

This is a sample time series table:

CREATE TABLE ts1(
    entity String,
    ts UInt64, -- timestamp, milliseconds from January 1 1970
    m Array(String), -- names of the metrics
    v Array(Float32), -- values of the metrics
    d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column
    dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column
) ENGINE = MergeTree(d, entity, 8192)

Here we are loading two metrics (load, temperature) for an entity(cpu):

INSERT INTO ts1(entity, ts, m, v) 
VALUES ('cpu', 1509232010254, ['load','temp'], [0.85, 68])

And querying cpu load:

SELECT 
    entity, 
    dt, 
    ts, 
    v[indexOf(m, 'load')] AS load
FROM ts1 
WHERE entity = 'cpu'

┌─entity─┬──────────────────dt─┬────────────ts─┬─load─┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ 0.85 │
└────────┴─────────────────────┴───────────────┴──────┘

Get data as array of tuples:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayMap((mm, vv) -> (mm, vv), m, v) AS metrics
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics─────────────────────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ [('load',0.85),('temp',68)] │
└────────┴─────────────────────┴───────────────┴─────────────────────────────┘

Get data as rows of tuples:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68)   │
└────────┴─────────────────────┴───────────────┴───────────────┘

Get rows with the metric you want:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metrics
FROM ts1 
WHERE metrics.1 = 'load'

┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics───────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
└────────┴─────────────────────┴───────────────┴───────────────┘

Get metric names and values as columns:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric, 
    metric.1 AS metric_name, 
    metric.2 AS metric_value
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┬─metric_name─┬─metric_value─┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ load        │         0.85 │
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68)   │ temp        │           68 │
└────────┴─────────────────────┴───────────────┴───────────────┴─────────────┴──────────────┘

Since CH has lots of useful date and time functions, along with higher order functions and tuples, I think it's almost a natural time-series database.

like image 98
Ramazan Polat Avatar answered Nov 15 '22 11:11

Ramazan Polat