For my case, I need to capture 15 performance metrics for devices and save it to InfluxDB. Each device has a unique device id.
Metrics are written into InfluxDB in the following way. Here I only show one as an example
new Serie.Builder("perfmetric1")
.columns("time", "value", "id", "type")
.values(getTime(), getPerf1(), getId(), getType())
.build()
Writing data is fast and easy. But I saw bad performance when I run query. I'm trying to get all 15 metric values for the last one hour.
select value from perfmetric1, perfmetric2, ..., permetric15
where id='testdeviceid' and time > now() - 1h
For an hour, each metric has 120 data points, in total it's 1800 data points. The query takes about 5 seconds on a c4.4xlarge EC2 instance when it's idle.
I believe InfluxDB can do better. Is this a problem of my schema design, or is it something else? Would splitting the query into 15 parallel calls go faster?
To monitor the internal performance of InfluxDB, enable the InfluxDB input plugin in the Telegraf configuration files used to run Telegraf on InfluxDB instances. The InfluxDB input plugin pulls InfluxDB internal metrics from the local InfluxDB /debug/vars endpoint.
Is there any limit to the amount of data that can be stored in InfluxDB? There are no systemic limits to the amount of data an individual instance can hold. Also the storage engine is very efficient (3 bytes per float or integer value). Data is naturally sharded by time so old data is not touched on most queries.
If monitoring is what you're most interested in, Prometheus is your safest bet because of its many integrations and scalable model. If you're more likely to be using a time series database for IoT, sensors, or analytics, then you'll probably want to choose InfluxDB.
InfluxDB is the perfect database for stats or other time-based data that would overwhelm SQL or NoSQL databases. This makes reporting, graphs, and other parts of your application simple to implement. The multitude of functions to allow data to be analyzed and reduced makes this a very powerful database engine.
As @valentin answer says, you need to build an index for the id column for InfluxDB to perform these queries efficiently.
In 0.8 stable you can do this "indexing" using continuous fanout queries. For example, the following continuous query will expand your perfmetric1
series into multiple series of the form perfmetric1.id
:
select * from perfmetric1 into perfmetric1.[id];
Later you would do:
select value from perfmetric1.testdeviceid, perfmetric2.testdeviceid, ..., permetric15.testdeviceid where time > now() - 1h
This query will take much less time to complete since InfluxDB won't have to perform a full scan of the timeseries to get the points for each testdeviceid
.
Build an index on id column. Seems that he engine uses full scan on table to retrieve data. By splitting your query in 15 threads, the engine will use 15 full scans and the performance will be much worse.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With