I just wonder if ClickHouse can be used for storing time-series data in the case like this: schema with columns: "some_entity_id", "timestamp", "metric1", "metric2", "metric3", ..., "metricN". Where each new column containing metric name can be added to the table dynamically, while adding entry with this metric name.
Have not found any information about dynamical table extend in official documentation.
So can this case be implemented in Clickhouse?
UPD: After some benchmarks we found out that ClickHouse writes new data faster than our current time-series storage, but reads data much more slower.
As you can see, ClickHouse is an amazing database with some powerful functions for time series. There is so much more to explore, we are only scratching the surface here. Now, it is up to you to use those time series functions on your own databases and solve new challenges!
The sparse index makes ClickHouse not so efficient for point queries retrieving single rows by their keys.
Storing time series data. Time series data is best stored in a time series database (TSDB) built specifically for handling metrics and events that are time-stamped. This is because time series data is often ingested in massive volumes that require a purpose-built database designed to handle that scale.
ClickHouse is a highly scalable open source database management system (DBMS) that uses a column-oriented structure. It's designed for online analytical processing (OLAP) and is highly performant. ClickHouse can return processed results in real time in a fraction of a second.
There are more than one ways to use CH as a time series database. My personal preference is to use one string array for metric names and one Float64 array for metric values.
This is a sample time series table:
CREATE TABLE ts1(
entity String,
ts UInt64, -- timestamp, milliseconds from January 1 1970
m Array(String), -- names of the metrics
v Array(Float32), -- values of the metrics
d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column
dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column
) ENGINE = MergeTree(d, entity, 8192)
Here we are loading two metrics (load, temperature) for an entity(cpu):
INSERT INTO ts1(entity, ts, m, v)
VALUES ('cpu', 1509232010254, ['load','temp'], [0.85, 68])
And querying cpu load:
SELECT
entity,
dt,
ts,
v[indexOf(m, 'load')] AS load
FROM ts1
WHERE entity = 'cpu'
┌─entity─┬──────────────────dt─┬────────────ts─┬─load─┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ 0.85 │
└────────┴─────────────────────┴───────────────┴──────┘
Get data as array of tuples:
SELECT
entity,
dt,
ts,
arrayMap((mm, vv) -> (mm, vv), m, v) AS metrics
FROM ts1
┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics─────────────────────┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ [('load',0.85),('temp',68)] │
└────────┴─────────────────────┴───────────────┴─────────────────────────────┘
Get data as rows of tuples:
SELECT
entity,
dt,
ts,
arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric
FROM ts1
┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68) │
└────────┴─────────────────────┴───────────────┴───────────────┘
Get rows with the metric you want:
SELECT
entity,
dt,
ts,
arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metrics
FROM ts1
WHERE metrics.1 = 'load'
┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics───────┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
└────────┴─────────────────────┴───────────────┴───────────────┘
Get metric names and values as columns:
SELECT
entity,
dt,
ts,
arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric,
metric.1 AS metric_name,
metric.2 AS metric_value
FROM ts1
┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┬─metric_name─┬─metric_value─┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ load │ 0.85 │
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68) │ temp │ 68 │
└────────┴─────────────────────┴───────────────┴───────────────┴─────────────┴──────────────┘
Since CH has lots of useful date and time functions, along with higher order functions and tuples, I think it's almost a natural time-series database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With