InfluxDB performance

Tags:

influxdb

For my case, I need to capture 15 performance metrics for devices and save it to InfluxDB. Each device has a unique device id.

Metrics are written into InfluxDB in the following way. Here I only show one as an example

new Serie.Builder("perfmetric1")
    .columns("time", "value", "id", "type")
    .values(getTime(), getPerf1(), getId(), getType())
    .build()

Writing data is fast and easy. But I saw bad performance when I run query. I'm trying to get all 15 metric values for the last one hour.

select value from perfmetric1, perfmetric2, ..., permetric15
where id='testdeviceid' and time > now() - 1h

For an hour, each metric has 120 data points, in total it's 1800 data points. The query takes about 5 seconds on a c4.4xlarge EC2 instance when it's idle.

I believe InfluxDB can do better. Is this a problem of my schema design, or is it something else? Would splitting the query into 15 parallel calls go faster?

721

asked Apr 24 '15 22:04

2 Answers

As @valentin answer says, you need to build an index for the id column for InfluxDB to perform these queries efficiently.

In 0.8 stable you can do this "indexing" using continuous fanout queries. For example, the following continuous query will expand your perfmetric1 series into multiple series of the form perfmetric1.id:

select * from perfmetric1 into perfmetric1.[id];

Later you would do:

select value from perfmetric1.testdeviceid, perfmetric2.testdeviceid, ..., permetric15.testdeviceid where time > now() - 1h

This query will take much less time to complete since InfluxDB won't have to perform a full scan of the timeseries to get the points for each testdeviceid.

192

answered Oct 23 '22 10:10

dukebody

Build an index on id column. Seems that he engine uses full scan on table to retrieve data. By splitting your query in 15 threads, the engine will use 15 full scans and the performance will be much worse.

answered Oct 23 '22 09:10

valentin

Related questions
                            
                                Time Series analysis with R, how to deal with daily data
                            
                                Rolling sum of time series with factor
                            
                                python recursive vectorization with timeseries
                            
                                plot acf of several timeseries in one plot
                            
                                Can we predict the dates where each customers is to make transaction(s)?
                            
                                NARX network in R
                            
                                R: How do I change gaps (holidays) in a time series of a daily index of the stock exchange by the previous day's information?
                            
                                Statsmodels SARIMAX: How can I deal with the maxlag error?
                            
                                Plotting large time series
                            
                                How to use tensorflow seq2seq without embeddings?
                            
                                How to efficiently parallelize time series forecasting using dask?
                            
                                Convert five-year data to annual data and calculate new records in R
                            
                                How to resample a column by id
                            
                                How to fix this error while using statsmodels" ImportError: cannot import name 'factorial'"?
                            
                                Plotting temporal TS and omitting NA data
                            
                                Dygraphs: Adding annotation to my time-series
                            
                                Pandas temporal cumulative sum by group
                            
                                ARMA.predict for out-of sample forecast does not work with floating points?
                            
                                Pandas: Remove NaN only at beginning and end of dataframe
                            
                                Start, End and Duration of Maximum Drawdown in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

InfluxDB performance

Tags:

time-series

influxdb

Cary Li

People also ask

2 Answers

dukebody

valentin

Recent Activity

Donate For Us