Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

InfluxDB performance

For my case, I need to capture 15 performance metrics for devices and save it to InfluxDB. Each device has a unique device id.

Metrics are written into InfluxDB in the following way. Here I only show one as an example

new Serie.Builder("perfmetric1")
    .columns("time", "value", "id", "type")
    .values(getTime(), getPerf1(), getId(), getType())
    .build()

Writing data is fast and easy. But I saw bad performance when I run query. I'm trying to get all 15 metric values for the last one hour.

select value from perfmetric1, perfmetric2, ..., permetric15
where id='testdeviceid' and time > now() - 1h

For an hour, each metric has 120 data points, in total it's 1800 data points. The query takes about 5 seconds on a c4.4xlarge EC2 instance when it's idle.

I believe InfluxDB can do better. Is this a problem of my schema design, or is it something else? Would splitting the query into 15 parallel calls go faster?

like image 721
Cary Li Avatar asked Apr 24 '15 22:04

Cary Li


People also ask

How do I monitor InfluxDB performance?

To monitor the internal performance of InfluxDB, enable the InfluxDB input plugin in the Telegraf configuration files used to run Telegraf on InfluxDB instances. The InfluxDB input plugin pulls InfluxDB internal metrics from the local InfluxDB /debug/vars endpoint.

How much data can InfluxDB handle?

Is there any limit to the amount of data that can be stored in InfluxDB? There are no systemic limits to the amount of data an individual instance can hold. Also the storage engine is very efficient (3 bytes per float or integer value). Data is naturally sharded by time so old data is not touched on most queries.

Which is better Prometheus or InfluxDB?

If monitoring is what you're most interested in, Prometheus is your safest bet because of its many integrations and scalable model. If you're more likely to be using a time series database for IoT, sensors, or analytics, then you'll probably want to choose InfluxDB.

How good is InfluxDB?

InfluxDB is the perfect database for stats or other time-based data that would overwhelm SQL or NoSQL databases. This makes reporting, graphs, and other parts of your application simple to implement. The multitude of functions to allow data to be analyzed and reduced makes this a very powerful database engine.


2 Answers

As @valentin answer says, you need to build an index for the id column for InfluxDB to perform these queries efficiently.

In 0.8 stable you can do this "indexing" using continuous fanout queries. For example, the following continuous query will expand your perfmetric1 series into multiple series of the form perfmetric1.id:

select * from perfmetric1 into perfmetric1.[id];

Later you would do:

select value from perfmetric1.testdeviceid, perfmetric2.testdeviceid, ..., permetric15.testdeviceid where time > now() - 1h

This query will take much less time to complete since InfluxDB won't have to perform a full scan of the timeseries to get the points for each testdeviceid.

like image 192
dukebody Avatar answered Oct 23 '22 10:10

dukebody


Build an index on id column. Seems that he engine uses full scan on table to retrieve data. By splitting your query in 15 threads, the engine will use 15 full scans and the performance will be much worse.

like image 23
valentin Avatar answered Oct 23 '22 09:10

valentin