Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query the most recent timestamp (MAX/Last) for a specific key, in Influx

Tags:

influxdb

Using InfluxDB (v1.1), I have the requirement where I want to get the last entry timestamp for a specific key. Regardless of which measurement this is stored and regardless of which value this was.

The setup is simple, where I have three measurements: location, network and usage.
There is only one key: device_id.

In pseudo-code, this would be something like:

# notice the lack of a FROM clause on measurement here...
SELECT MAX(time) WHERE 'device_id' = 'x';

The question: What would be the most efficient way of querying this?

The reason why I want this is that there will be a decentralised sync process. Some devices may have been updated in the last hour, whilst others haven't been updated in months. Being able to get a distinct "last updated on" timestamp for a device (key) would allow me to more efficiently store new points to Influx.

I've also noticed there is a similar discussion on InfluxDB's GitHub repo (#5793), but the question there is not filtering by any field/key. And this is exactly what I want: getting the 'last' entry for a specific key.

like image 916
Juliën Avatar asked Dec 29 '16 10:12

Juliën


People also ask

How do I get the last record on InfluxDB?

Use first() or last() to return the first or last record in an input table. By default, InfluxDB returns results sorted by time, however you can use the sort() function to change how results are sorted.

What is InfluxQL?

InfluxQL is an SQL-like query language for interacting with data in InfluxDB. The following sections detail InfluxQL's SELECT statement and useful query syntax for exploring your data.

What is timestamp in InfluxDB?

Timestamps are UNIX timestamps. The minimum valid timestamp is -9223372036854775806 or 1677-09-21T00:12:43.145224194Z . The maximum valid timestamp is 9223372036854775806 or 2262-04-11T23:47:16.854775806Z . As mentioned above, by default, InfluxDB assumes that timestamps have nanosecond precision.

How do I query data from InfluxDB?

To perform an InfluxQL query, send a GET request to the /query endpoint, set the URL parameter db as the target database, and set the URL parameter q as your query. You can also use a POST request by sending the same parameters either as URL parameters or as part of the body with application/x-www-form-urlencoded .


2 Answers

tl;dr; The first() and last() selectors will NOT work consistently if the measurement have multiple fields, and fields have NULL values. The most efficient solution is to use these queries

First:

SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1

Last:

SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1

Explanation:

If you have a single field in your measurement, then the suggested solutions will work, but if you have more than one field and values can be NULL then first() and last() selectors won't work consistently and may return different timestamps for each field. For example, let's say that you have the following data set:

time                   fieldKey_1     fieldKey_2     device
------------------------------------------------------------
2019-09-16T00:00:01Z   NULL           A              1
2019-09-16T00:00:02Z   X              B              1
2019-09-16T00:00:03Z   Y              C              2
2019-09-16T00:00:04Z   Z              NULL           2

In this case querying

SELECT first(fieldKey_1) FROM <measurement> WHERE device = "1" 

will return

time                   fieldKey_1
---------------------------------
2019-09-16T00:00:02Z   X         

and the same query for first(fieldKey_2) will return a different time

time                   fieldKey_2
---------------------------------
2019-09-16T00:00:01Z   A

A similar problem will happen when querying with last.

And in case you are wondering, it wouldn't do querying 'first(*)' since you'll get an 'epoch-0' time in the results, such as:

 time                   first_fieldKey_1    first_fieldKey_2
 -------------------------------------------------------------
 1970-01-01T00:00:00Z   X                   A

So, the solution would be querying using combinations of LIMIT and ORDER BY. For instance, for the first time value you can use:

SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1

and for the last one you can use

SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1

It is safe and fast as it will relay on indexes.

Is curious to mention that this more simple approach was mentioned in the thread linked in the opening post, but was discarded. Maybe it was just lost overlooked.

Here there's a thread in InfluxData blogs about the subject also suggesting to use this approach.

like image 63
Leonardo Allievi Avatar answered Sep 18 '22 10:09

Leonardo Allievi


Unfortunately there wont be single query that will get you what you're looking for. You'll have to do a bit of work client side.

The query that you'll want is

SELECT last(<field name>), time FROM <measurement> WHERE device_id = 'x'

You'll need to run this query for each measurement.

SELECT last(<field name>), time FROM location WHERE device_id = 'x'
SELECT last(<field name>), time FROM network WHERE device_id = 'x'
SELECT last(<field name>), time FROM usage WHERE device_id = 'x'

From there you'll get the one with the greatest time stamp

> select last(value), time from location where device_id = 'x'; select last(value), time from network where device_id = 'x'; select last(value),     time from usage where device_id = 'x';
name: location
time                last
----                ----
1483640697584904775 3

name: network
time                last
----                ----
1483640714335794796 4

name: usage
time                last
----                ----
1483640783941353064 4
like image 27
Michael Desa Avatar answered Sep 18 '22 10:09

Michael Desa