Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL averages per row from multiple columns and nulls

Tags:

sql

postgresql

I have an app that logs data for sensors and I want to be able to produce averages from multiple sensors, could be one, two, three or plenty...

EDIT: These are temperature sensors so 0 is a value that the sensors might store as a value in the database.

My initial starting point was this SQL query:

SELECT grid.t5||'.000000' as ts,
avg(t.sensorvalue)  sensorvalue1
, avg(w.sensorvalue)AS sensorvalue2
FROM 
(SELECT generate_series(min(date_trunc('hour', ts))                         
,max(ts), interval '5 min') AS t5   FROM device_history_20865735 where      
 ts between '2015/05/13 09:00' and '2015/05/14 09:00'   ) grid 

 LEFT JOIN device_history_20865735 t ON t.ts >= grid.t5 AND t.ts <  grid.t5 +  interval '5 min' 
 LEFT JOIN device_history_493417852 w ON w.ts >= grid.t5 AND w.ts <  grid.t5 +  interval '5 min' 
--WHERE t.sensorvalue notnull
GROUP  BY grid.t5 ORDER  BY grid.t5

I get 5 min averages as it is better for my app.

The results as expected have NULL values for either sensorvalue1 or 2:

ts;sensorvalue1;sensorvalue2
"2015-05-13 09:00:00.000000";19.9300003051758;
"2015-05-13 09:05:00.000000";20;
"2015-05-13 09:10:00.000000";;
"2015-05-13 09:15:00.000000";20.0599994659424;
"2015-05-13 09:20:00.000000";;
"2015-05-13 09:25:00.000000";20.1200008392334;

My aim is to calculate an average for each 5 min interval from all the available sensors so as NULLs are a problem I thought of using a CASE statement so if there is a NULL to get the value of the other sensor...

SELECT grid.t5||'.000000' as ts,
CASE 
        WHEN avg(t.sensorvalue) ISNULL  THEN avg(w.sensorvalue)
        ELSE avg(t.sensorvalue)
END AS sensorvalue
,
CASE 
        WHEN avg(w.sensorvalue) ISNULL  THEN avg(t.sensorvalue)
        ELSE avg(w.sensorvalue)
END AS sensorvalue2
FROM 
(SELECT generate_series(min(date_trunc('hour', ts)),max(ts), interval '5 min') AS t5
FROM device_history_20865735 where      
 ts between '2015/05/13 09:00' and '2015/05/14 09:00'   ) grid 

 LEFT JOIN device_history_20865735 t ON t.ts >= grid.t5 AND t.ts <  grid.t5 +  interval '5 min' 
 LEFT JOIN device_history_493417852 w ON w.ts >= grid.t5 AND w.ts <  grid.t5 +  interval '5 min' 
GROUP  BY grid.t5 ORDER  BY grid.t5

but then to calculate the average I have to do another select on top of this and devide per number of columns (aka sensors) and if they are just two it is OK but if there are 3 or 4 sensors this can get very messy as there could be multiple sensors with NULL values per row...

The SQL is derived grammatically from an app (using Python) using postgres 9.4 so is there a simple way to achieve what is needed as I feel I'm down a rather complex route...?

EDIT #2: With your input I've produce this SQL code, again it seems rather complex but open to your ideas and scrutiny if it is reliable and maintainable:

SELECT ts, sensortotal, sensorcount,
CASE
    WHEN sensorcount = 0 THEN -1000
    ELSE sensortotal/sensorcount
END AS sensorAvg

FROM (
    WITH grid as (
          SELECT t5
          FROM (SELECT generate_series(min(date_trunc('hour', ts)), max(ts), interval '5 min') as t5
                FROM device_history_20865735
               ) d
          WHERE t5 between '2015-05-13 09:00' and '2015-05-14 09:00'   
         )
    SELECT d1.t5 || '.000000' as ts
           , Coalesce(avg(d1.sensorvalue), 0) + Coalesce(avg(d2.sensorvalue),0) as sensorTotal
           , (CASE
                    WHEN avg(d1.sensorvalue) ISNULL THEN 0
                    ELSE 1
           END + CASE
            WHEN avg(d2.sensorvalue) ISNULL THEN 0
            ELSE 1
           END) as sensorCount

    FROM (SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
          FROM grid LEFT JOIN
               device_history_20865735 t
               ON t.ts >= grid.t5 AND t.ts <grid.t5 +  interval '5 min' 
          GROUP BY grid.t5
         ) d1 LEFT JOIN
         (SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
          FROM grid LEFT JOIN
               device_history_493417852 t
               ON t.ts >= grid.t5 AND t.ts <grid.t5 +  interval '5 min' 
         GROUP BY grid.t5
         ) d2 on d1.t5 = d2.t5
    GROUP BY d1.t5
    ORDER BY d1.t5
) tmp;

Thanks!

like image 828
Kostas Avatar asked Nov 10 '22 13:11

Kostas


1 Answers

It sounds like you want to something like this:

(coalesce(value1,0) + coalesce(value2,0) + coalesce(value3,0)) /
(value1 IS NOT NULL::int + value2 IS NOT NULL::int + value3 IS NOT NULL::int)
AS average

Basically, just do the math you want to do for each row. The only "tricky" part is how to "count" the non-null values--I used a cast, but there are other options such as:

CASE WHEN value1 IS NULL THEN 0 ELSE 1 END
like image 102
Dwayne Towell Avatar answered Nov 14 '22 23:11

Dwayne Towell