Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When organising an InfluxDB database, which of these two approaches would be most preferred?

I am trying to decide how measurements should be organised in an InfluxDB the database (which I believe they call schema design and data layout) but I think this may be a more general database type question.

Let's say as a simple example that I am measuring two quantites, temperature and humidity (imaginative, I know!), in two locations, living room and outside.

InfluxDB has the syntax for inserting data points:

measurement, tag_key=tag_value field_key=field_value

and so there are two obvious (at least to me) options. Briefly, the first option would insert a datapoint like this:

INSERT temperature,location=outside value=15
INSERT humidity,location=outside value=50

whereas the second option would do it this way:

INSERT sensor_measurements,location=outside temperature=15,humidity=50

My questions are more high level:

  • Is there a preferred/accepted way to go about this?
  • Will I run into problems with either of these if I try to scale it up to more quantities/locations/data types?
  • Does either of the methods offer an advantage if I later on try to graph these things in Grafana, for example, or if I try to implement later some of the many InfluxQL functions?
  • Does anyone have any general advice about this to offer?

My own thoughts:

Option 1 seems to me to be more like what is implied by the InfluxDB description "measurement". Both temperature and humidity are separate quantities. But it seems a little clunky to just call it "value".

Option 2 appears to have the advantage that both the humidity and the temperature share exactly the same timestamp. This would come in useful, for example, if I wanted to import the data into some other software and do a correlation between the two quantites, and would mean I wouldn't have to do any interpolation or binning to get them to match up.

I am not sure if it is a bad idea with Option 2 to just have a general measurement called sensor_measurements, and will be hard to maintain later.

In detail:


Option 1

  • Have a separate "measurement" for each of temperature and humidity, use the location as a "tag", and just name the "field" as value:

At time t1, insert the data:

INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28

At time t2, insert some different data:

INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28

I can then get access to the living room temperature by querying the following:

> SELECT value FROM temperature WHERE location='living_room'

name: temperature
time                value
----                -----
1590416682017481091 28
1590416723963187592 29

I can also use the group by function to do something like this:

SELECT value FROM temperature GROUP BY "location"

Option 2

  • Have a combined "measurement" called sensor_measurements, for example, use a "tag" for location, and then have separate "fields" for each of temperature and humidity:

At time t1, insert the data:

INSERT sensor_measurements,location=outside temperature=15,humidity=50
INSERT sensor_measurements,location=living_room temperature=28,humidity=65

At time t2, insert some different data:

INSERT sensor_measurements,location=outside temperature=14,humidity=56
INSERT sensor_measurements,location=living_room temperature=29,humidity=63

I can now get access to the living room temperature by querying the following:

> SELECT temperature FROM sensor_measurements WHERE location='living_room'

name: sensor_measurements
time                temperature
----                -----------
1590416731530452068 28
1590416757055629103 29

I can now use the group by function to do something like this:

SELECT temperature FROM sensor_measurements GROUP BY "location"

like image 940
teeeeee Avatar asked May 25 '20 15:05

teeeeee


People also ask

What database does InfluxDB use?

InfluxDB is an open-source time series database (TSDB) developed by the company InfluxData. It is written in the Go programming language for storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.

What is difference between bucket and database in InfluxDB?

A bucket is a named location where data is stored that has a retention policy. It's similar to an InfluxDB v1. x “database,” but is a combination of both a database and a retention policy. When using multiple retention policies, each retention policy is treated as is its own bucket.

What is InfluxDB time series database?

InfluxDB is the essential time series toolkit — dashboards, queries, tasks and agents all in one place. Collecting data. Telegraf is a plugin-driven server agent for collecting and sending metrics and events from databases, systems, and IoT sensors.

What is difference between tags and fields in InfluxDB?

InfluxDB lets you specify fields and tags, both being key/value pairs where the difference is that tags are automatically indexed. Because fields are not being indexed at all, on every query where InfluxDB is asked to find a specified field, it needs to sequentially scan every value of the field column.


1 Answers

I would use option 2 from offered options, because less records = less resources = better query response time (in theory). Generally, both approaches look good.

But I will use more generic 3rd option in real world. Single generic metrics measurement with tags metric,location and field value:

INSERT metrics,metric=temperature,location=outside value=15
INSERT metrics,metric=humidity,location=living_room value=50
INSERT metrics,metric=temperature,location=living_room value=28
INSERT metrics,metric=humidity,location=living_room value=65

That gives me opportunity to create single generic Grafana dashboard, where user will have option to select visualized metric/location via dashboard variable (generated directly from the InfluxDB, e.g. SHOW TAG VALUES WITH KEY = "metric"). Any new inserted metrics (e.g. `illuminance , pressure, wind-speed, wind-direction, ...) or location can be immediately visualized in this generic dashboard. Eventually, some metrics may have also additional tags. That is good and I will be able to use ad-hoc Grafana variable, so users will be able specify any number of key/value filters on the fly. Grafana doc.

like image 63
Jan Garaj Avatar answered Oct 19 '22 11:10

Jan Garaj