Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

InfluxDB : single or multiple measurement

I'm a beginner with influxDB and after reading the Schema design documentation a question remain.

How to decide if you should use one measurement with multiple fields or multiple measurement with single field ?

I have multiple iot device which send every minute datas (temperature,humidity, pressure). All this datas have the exact same timestamp.

So i was wondering if d rather create one measurement like this :

    timestamp,iotid,temperature,humidity,pressure -------------------------------------------------     1501230195,iot1,70,         45,      850 

Or 3 measurements (one for each value) , with the same tags but only one field in it ?

timestamp,iotid,temperature ----------------------------     1501230195,iot1,70  timestamp,iotid,humidity -------------------------     1501230195,iot1,45  timestamp,iotid,pressure -------------------------     1501230195,iot1,850 

Query-wise, i could retrieve only one value but also the 3 at the same time.

like image 245
grunk Avatar asked Jul 28 '17 08:07

grunk


People also ask

Is InfluxDB a columnar database?

InfluxDB stores data in a columnar format, further organized into time-bounded chunks.

How do I view Dimensions in InfluxDB?

influxdb Querying Influx Show measurementsIn-depth information about this can be found in the API docs 'Show Measurements'.

Does InfluxDB have schema?

Each InfluxDB use case is unique and your schema reflects that uniqueness. In general, a schema designed for querying leads to simpler and more performant queries. We recommend the following design guidelines for most use cases: Where to store data (tag or field)

How does InfluxDB store data?

An InfluxDB database stores points . A point has four components: a measurement , a tagset , a fieldset , and a timestamp . The measurement provides a way to associate related points that might have different tagsets or fieldsets . The tagset is a dictionary of key-value pairs to store metadata with a point.


1 Answers

Bit of an old question but this is probably relevant to anyone working on TSDBs.

When I first started, my appoach used to be that every data point went into a single field measurement. The assumption was that I'd combine the data I needed in a SQL statement at a later date. However, as anyone who's used a TSDB like influx knows that there are some serious limitations with one can do in the retrieval of data because of the design choices used in implementing a TSDB.

As I've moved forward in my project, here are the rules of thumb I have developed:

A measurement should contain all the dimensions required for it to make sense but no more.

Example: imagine a gas flow meter which gives 3 signals:

  • volumetric flow
  • temperature
  • total flow

In this scenario, volumetric flow and temperature should be two fields of a single measurement, and total flow should be its own measurement.

(if the reader doesn't like this example, think of a home electric meter that outputs amps and volts, and kw and pf).

Why would it be bad to store volumetric and temp in different series?

  1. Timing: if you store those two measurements in different series, they will have different index values (timestamp). Unless you take care to make sure they have explicitly specified timestamps, you run the risk of them being slightly offsampled. This can very well end up being a Bad Thing (tm) because you might be introducing a systematic measurement bias in your data. Even if it's not a bad thing, it's going to be super annoying if you ever want to reuse this data later on (e.g. to dump it in a csv file).

  2. Utility: if you want to deduce volumetric flow rate, you will have to get constant * temp * volume to get a correct value. Doing this with two separate measurements becomes a nightmare because, for instance, influxdb does not even support the operation. But even if it did, you'd have to make sure missing values of one of the fields aren't incorrectly handled and that grouping and aggregation is done right.

Why would it be bad to store all three in a single measurement?

You may very well have a use case in which you want to audit all three values at all times, but chances are this is not the case and you don't care about measuring total volume at the same kind of frequency that you'd like to measure flow itself.

Putting all the fields in a single measurement will force you to either put nulls in certain fields, or to always log a variable that barely changes. Either way, it's not efficient.

The important insight is that multi-dimensional entities require all their dimensions at the same time to make sense.

like image 129
MB. Avatar answered Oct 26 '22 21:10

MB.