Using InfluxDB (v1.1), I have the requirement where I want to get the last entry timestamp for a specific key. Regardless of which measurement this is stored and regardless of which value this was.
The setup is simple, where I have three measurements: location
, network
and usage
.
There is only one key: device_id
.
In pseudo-code, this would be something like:
# notice the lack of a FROM clause on measurement here...
SELECT MAX(time) WHERE 'device_id' = 'x';
The question: What would be the most efficient way of querying this?
The reason why I want this is that there will be a decentralised sync process. Some devices may have been updated in the last hour, whilst others haven't been updated in months. Being able to get a distinct "last updated on" timestamp for a device (key) would allow me to more efficiently store new points to Influx.
I've also noticed there is a similar discussion on InfluxDB's GitHub repo (#5793), but the question there is not filtering by any field/key. And this is exactly what I want: getting the 'last' entry for a specific key.
Use first() or last() to return the first or last record in an input table. By default, InfluxDB returns results sorted by time, however you can use the sort() function to change how results are sorted.
InfluxQL is an SQL-like query language for interacting with data in InfluxDB. The following sections detail InfluxQL's SELECT statement and useful query syntax for exploring your data.
Timestamps are UNIX timestamps. The minimum valid timestamp is -9223372036854775806 or 1677-09-21T00:12:43.145224194Z . The maximum valid timestamp is 9223372036854775806 or 2262-04-11T23:47:16.854775806Z . As mentioned above, by default, InfluxDB assumes that timestamps have nanosecond precision.
To perform an InfluxQL query, send a GET request to the /query endpoint, set the URL parameter db as the target database, and set the URL parameter q as your query. You can also use a POST request by sending the same parameters either as URL parameters or as part of the body with application/x-www-form-urlencoded .
tl;dr; The first() and last() selectors will NOT work consistently if the measurement have multiple fields, and fields have NULL values. The most efficient solution is to use these queries
First:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
Last:
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
Explanation:
If you have a single field in your measurement, then the suggested solutions will work, but if you have more than one field and values can be NULL then first() and last() selectors won't work consistently and may return different timestamps for each field. For example, let's say that you have the following data set:
time fieldKey_1 fieldKey_2 device
------------------------------------------------------------
2019-09-16T00:00:01Z NULL A 1
2019-09-16T00:00:02Z X B 1
2019-09-16T00:00:03Z Y C 2
2019-09-16T00:00:04Z Z NULL 2
In this case querying
SELECT first(fieldKey_1) FROM <measurement> WHERE device = "1"
will return
time fieldKey_1
---------------------------------
2019-09-16T00:00:02Z X
and the same query for first(fieldKey_2) will return a different time
time fieldKey_2
---------------------------------
2019-09-16T00:00:01Z A
A similar problem will happen when querying with last.
And in case you are wondering, it wouldn't do querying 'first(*)' since you'll get an 'epoch-0' time in the results, such as:
time first_fieldKey_1 first_fieldKey_2
-------------------------------------------------------------
1970-01-01T00:00:00Z X A
So, the solution would be querying using combinations of LIMIT and ORDER BY. For instance, for the first time value you can use:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
and for the last one you can use
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
It is safe and fast as it will relay on indexes.
Is curious to mention that this more simple approach was mentioned in the thread linked in the opening post, but was discarded. Maybe it was just lost overlooked.
Here there's a thread in InfluxData blogs about the subject also suggesting to use this approach.
Unfortunately there wont be single query that will get you what you're looking for. You'll have to do a bit of work client side.
The query that you'll want is
SELECT last(<field name>), time FROM <measurement> WHERE device_id = 'x'
You'll need to run this query for each measurement.
SELECT last(<field name>), time FROM location WHERE device_id = 'x'
SELECT last(<field name>), time FROM network WHERE device_id = 'x'
SELECT last(<field name>), time FROM usage WHERE device_id = 'x'
From there you'll get the one with the greatest time stamp
> select last(value), time from location where device_id = 'x'; select last(value), time from network where device_id = 'x'; select last(value), time from usage where device_id = 'x';
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With