Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SUM(LAST()) on GROUP BY

Tags:

influxdb

I have a series, disk, that contains a path (/mnt/disk1, /mnt/disk2, etc) and total space of a disk. It also includes free and used values. These values are updated at a specified interval. What I would like to do, is query to get the sum of the total of the last() of each path. I would also like to do the same for free and for used, to get a aggregate of the total size, free space, and used space of all of my disks on my server.

I have a query here that will get me the last(total) of all the disks, grouped by its path (for distinction):

select last(total) as total from disk where path =~ /(mnt\/disk).*/ group by path

Currently, this returns 5 series, each containing 1 row (the latest) and the value of its total. I then want to take the sum of those series, but I cannot just wrap the last(total) into a sum() function call. Is there a way to do this that I am missing?

like image 843
ehftwelve Avatar asked Oct 29 '22 19:10

ehftwelve


1 Answers

Carrying on from my comment above about nested functions.

Building a toy example:

CREATE DATABASE FOO
USE FOO

Assuming your data is updated at intervals greater than[1] every minute:

CREATE CONTINUOUS QUERY disk_sum_total ON FOO 
BEGIN
  SELECT sum("total") AS "total_1m" INTO disk_1m_total FROM "disk" 
  GROUP BY time(1m)
END

Then push some values in:

INSERT disk,path="/mnt/disk1" total=30
INSERT disk,path="/mnt/disk2" total=32
INSERT disk,path="/mnt/disk3" total=33

And wait more than a minute. Then:

INSERT disk,path="/mnt/disk1" total=41
INSERT disk,path="/mnt/disk2" total=42
INSERT disk,path="/mnt/disk3" total=43

And wait a minute+ again. Then:

SELECT * FROM disk_1m_total

name: disk_1m_total
-------------------
time                    total_1m
1476015300000000000     95
1476015420000000000     126

The two values are 30+32+33=95 and 41+42+43=126.

From there, it's trivial to query:

SELECT last(total_1m) FROM disk_1m_total

name: disk_1m_total
-------------------
time                    last
1476015420000000000     126

Hope that helps.

[1] Picking intervals smaller than the update frequency prevents minor timing jitters from making all the data being accidentally summed twice for a given group. There might be some "zero update" intervals, but no "double counting" intervals. I typically run the query twice as fast as the updates. If the CQ sees no data for a window, there will be no CQ performed for that window, so last() will still give the correct answer. For example, I left the CQ running overnight and pushed no new data in: last(total_1m) gives the same answer, not zero for "no new data".

like image 178
Jason Avatar answered Nov 15 '22 11:11

Jason