I just wonder if ClickHouse can be used for storing time-series data in the case like this: schema with columns: "some_entity_id", "timestamp", "metric1", "metric2", "metric3", ..., "metricN". Where each new column containing metric name can be added to the table dynamically, while adding entry with this metric name. Have not found any information about dynamical table extend in official documentation. So can this case be implemented in Clickhouse? UPD: After some benchmarks we found out that ClickHouse writes new data faster than our current time-series storage, but reads data much more slower.

There are more than one ways to use CH as a time series database. My personal preference is to use one string array for metric names and one Float64 array for metric values. This is a sample time series table: <pre class="prettyprint"><code>CREATE TABLE ts1( entity String, ts UInt64, -- timestamp, milliseconds from January 1 1970 m Array(String), -- names of the metrics v Array(Float32), -- values of the metrics d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column ) ENGINE = MergeTree(d, entity, 8192) </code></pre> Here we are loading two metrics (load, temperature) for an entity(cpu): <pre class="prettyprint"><code>INSERT INTO ts1(entity, ts, m, v) VALUES ('cpu', 1509232010254, ['load','temp'], [0.85, 68]) </code></pre> And querying cpu load: <pre class="prettyprint"><code>SELECT entity, dt, ts, v[indexOf(m, 'load')] AS load FROM ts1 WHERE entity = 'cpu' ┌─entity─┬──────────────────dt─┬────────────ts─┬─load─┐ │ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ 0.85 │ └────────┴─────────────────────┴───────────────┴──────┘ </code></pre> Get data as array of tuples: <pre class="prettyprint"><code>SELECT entity, dt, ts, arrayMap((mm, vv) -> (mm, vv), m, v) AS metrics FROM ts1 ┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics─────────────────────┐ │ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ [('load',0.85),('temp',68)] │ └────────┴─────────────────────┴───────────────┴─────────────────────────────┘ </code></pre> Get data as rows of tuples: <pre class="prettyprint"><code>SELECT entity, dt, ts, arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric FROM ts1 ┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┐ │ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ │ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68) │ └────────┴─────────────────────┴───────────────┴───────────────┘ </code></pre> Get rows with the metric you want: <pre class="prettyprint"><code>SELECT entity, dt, ts, arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metrics FROM ts1 WHERE metrics.1 = 'load' ┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics───────┐ │ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ └────────┴─────────────────────┴───────────────┴───────────────┘ </code></pre> Get metric names and values as columns: <pre class="prettyprint"><code>SELECT entity, dt, ts, arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric, metric.1 AS metric_name, metric.2 AS metric_value FROM ts1 ┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┬─metric_name─┬─metric_value─┐ │ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ load │ 0.85 │ │ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68) │ temp │ 68 │ └────────┴─────────────────────┴───────────────┴───────────────┴─────────────┴──────────────┘ </code></pre> Since CH has lots of useful date and time functions, along with higher order functions and tuples, I think it's almost a natural time-series database.

Clickhouse as time-series storage

Tags:

database

time-series

clickhouse

I just wonder if ClickHouse can be used for storing time-series data in the case like this: schema with columns: "some_entity_id", "timestamp", "metric1", "metric2", "metric3", ..., "metricN". Where each new column containing metric name can be added to the table dynamically, while adding entry with this metric name.

Have not found any information about dynamical table extend in official documentation.

So can this case be implemented in Clickhouse?

UPD: After some benchmarks we found out that ClickHouse writes new data faster than our current time-series storage, but reads data much more slower.

322

asked Feb 22 '17 12:02

Filipp Shestakov

1 Answers

There are more than one ways to use CH as a time series database. My personal preference is to use one string array for metric names and one Float64 array for metric values.

This is a sample time series table:

CREATE TABLE ts1(
    entity String,
    ts UInt64, -- timestamp, milliseconds from January 1 1970
    m Array(String), -- names of the metrics
    v Array(Float32), -- values of the metrics
    d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column
    dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column
) ENGINE = MergeTree(d, entity, 8192)

Here we are loading two metrics (load, temperature) for an entity(cpu):

INSERT INTO ts1(entity, ts, m, v) 
VALUES ('cpu', 1509232010254, ['load','temp'], [0.85, 68])

And querying cpu load:

SELECT 
    entity, 
    dt, 
    ts, 
    v[indexOf(m, 'load')] AS load
FROM ts1 
WHERE entity = 'cpu'

┌─entity─┬──────────────────dt─┬────────────ts─┬─load─┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ 0.85 │
└────────┴─────────────────────┴───────────────┴──────┘

Get data as array of tuples:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayMap((mm, vv) -> (mm, vv), m, v) AS metrics
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics─────────────────────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ [('load',0.85),('temp',68)] │
└────────┴─────────────────────┴───────────────┴─────────────────────────────┘

Get data as rows of tuples:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68)   │
└────────┴─────────────────────┴───────────────┴───────────────┘

Get rows with the metric you want:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metrics
FROM ts1 
WHERE metrics.1 = 'load'

┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics───────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
└────────┴─────────────────────┴───────────────┴───────────────┘

Get metric names and values as columns:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric, 
    metric.1 AS metric_name, 
    metric.2 AS metric_value
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┬─metric_name─┬─metric_value─┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ load        │         0.85 │
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68)   │ temp        │           68 │
└────────┴─────────────────────┴───────────────┴───────────────┴─────────────┴──────────────┘

Since CH has lots of useful date and time functions, along with higher order functions and tuples, I think it's almost a natural time-series database.

answered Nov 15 '22 11:11

Ramazan Polat

Related questions
                            
                                MySQL Database Connection With Visual Studio 2013 Preview
                            
                                How to change data type from date to int in SQL Server 2012?
                            
                                Best way to maintain a customer's account balance
                            
                                java ResultSet, using MAX sql function
                            
                                How to backup some tables with data and some tables only schema PostgreSQL
                            
                                To CouchDB or not to?
                            
                                CursorIndexOutOfBoundsException Index 0 requested, with a size of 0
                            
                                Normalize table to 3rd normal form
                            
                                Cassandra Vs ScyllaDB Memory Usage
                            
                                PHP - Query single value per iteration or fetch all at start and retrieve from array?
                            
                                Best database for high write (10000+ inserts/hour), low read (10 reads/second)?
                            
                                Why use MySQL over flatfiles?
                            
                                How to store .txt files MySQL database?
                            
                                Storing database connection in a session variable [duplicate]
                            
                                How to solve org.hibernate.StaleObjectStateException when copying data from one database to another?
                            
                                Sybase date comparison - Correct format?
                            
                                Are long-living transactions acceptable?
                            
                                How to see if a user is online in a website with php and mysql driven databases?
                            
                                Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count
                            
                                How to use foreign keys with PHP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With