Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average over a timeframe with missing data

Assuming a table such as:

UID     Name        Datetime                Users
4       Room 4      2012-08-03 14:00:00     3
2       Room 2      2012-08-03 14:00:00     3
3       Room 3      2012-08-03 14:00:00     1
1       Room 1      2012-08-03 14:00:00     2

3       Room 3      2012-08-03 14:15:00     1
2       Room 2      2012-08-03 14:15:00     4
1       Room 1      2012-08-03 14:15:00     3

1       Room 1      2012-08-03 14:30:00     6

1       Room 1      2012-08-03 14:45:00     3
2       Room 2      2012-08-03 14:45:00     7
3       Room 3      2012-08-03 14:45:00     8
4       Room 4      2012-08-03 14:45:00     4

I wanted to get the average user count of each room (1,2,3,4) from the time 2PM to 3PM. The problem is that sometimes the room may not "check in" at the 15 minute interval time, so the assumption has to be made that the previous last known user count is still valid.

For example the check-in's for 2012-08-03 14:15:00 room 4 never checked in, so it must be assumed that room 4 had 3 users at 2012-08-03 14:15:00 because that is what it had at 2012-08-03 14:00:00

This follows on through so that the average user count I am looking for is as follows:

Room 1: (2 + 3 + 6 + 3) / 4 = 3.5
Room 2: (3 + 4 + 4 + 7) / 4 = 4.5
Room 3: (1 + 1 + 1 + 8) / 4 = 2.75
Room 4: (3 + 3 + 3 + 4) / 4 = 3.25

where # is the assumed number based on the previous known check-in.

I am wondering if it's possible to so this with SQL alone? if not I am curious of a ingenious PHP solution that isn't just bruteforce math, as such as my quick inaccurate pseudo code:

foreach ($rooms_id_array as $room_id) {
    $SQL = "SELECT * FROM `table` WHERE (`UID` == $room_id && `Datetime` >= 2012-08-03 14:00:00 && `Datetime` <= 2012-08-03 15:00:00)";
    $result = query($SQL);
    if ( count($result) < 4 ) {
        // go through each date and find what is missing, and then go to previous date and use that instead
    } else {
        foreach ($result)
            $sum += $result;
        $avg = $sum / 4;
    }

}
like image 915
ParoX Avatar asked Aug 04 '12 11:08

ParoX


People also ask

How do you find the average with missing values?

To average the right answer with missing values, you can use below formulas. Select the cell you will place the result, and type this formula =AGGREGATE(1,6,A2:C2), press Shift + Ctrl + Enter keys. Also you can use this formula =AVERAGE(IF(ISNUMBER(A2:C2),(A2:C2))), hold Shift key and press Ctrl + Enter keys.

How do you predict missing data in Excel?

In the Variable column, select Variable_1, then under How do you want to handle missing values for the selected variable(s), click the down arrow at Select treatment, and select Mean. Click Apply to selected variable(s). The Missing Data Handling dialog displays Mean under Treatment for Variable_1.


1 Answers

Your difficulty (most costly step) will be to fill in the blanks. If it is not possible to "fill in the blanks" in your source data, you probably want to have a template to join on, then use correlated-sub-queries to find the data associated with that template.

This is often best with real tables, but here is an example with hard-coded in-line-views instead...

SELECT
  `room`.`uid`           `uid` ,
  AVG(`data`.`users`)    `average_users`
FROM
  (SELECT 1 `UID`  UNION ALL
   SELECT 2 `UID`  UNION ALL
   SELECT 3 `UID`  UNION ALL
   SELECT 4 `UID`)                                     `room`
CROSS JOIN
  (SELECT '2012-08-03 14:00:00' `datetime`  UNION ALL
   SELECT '2012-08-03 14:15:00' `datetime`  UNION ALL
   SELECT '2012-08-03 14:30:00' `datetime`  UNION ALL
   SELECT '2012-08-03 14:45:00' `datetime`)            `checkin`
LEFT JOIN
  data
    ON  `data`.`uid`      = `room`.`uid`
    AND `data`.`datetime` = (SELECT MAX(`datetime`)
                               FROM `data`
                              WHERE `uid`       = `room`.`uid`
                                AND `datetime` <= `checkin`.`datetime`)
GROUP BY
  `room`.`uid`

- The CROSS JOIN creates the template to ensure that you always have a record for every checkin slot for every room.

- The correlated sub-query searches back through time to find the most recent checkin for that room at that time.

like image 85
MatBailie Avatar answered Oct 05 '22 20:10

MatBailie