Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reporting on a grouped average over a group of records

My goal is to generate a report showing the average occupancy of a garage (y-axis) at a given day of the week and/or time of day. My data model is as follows:

  • Garage has_many Cars and Garage has_many Appointments, through: :cars
  • Car has_many Appointments
  • Appointment has fields such as:
    • picked_up_at (datetime)
    • returned_at (datetime)

Also, Garage has a field capacity (integer), which is the maximum number of cars that will fit in the garage.

If I have a list of Appointments spanning the last 6 months, and I would like to generate a line-graph with the x-axis showing each day of the week, broken down into 4-hour intervals, and the y-axis showing the average % occupancy (# of cars in the garage / capacity) over the 6 month period for the given day/hour interval, how can I go about gathering this data to report on?

E.g. a car is In from the time of one Appointment's return until the next Appointment's pickup, and Out from the Appointment's pickup until it's returned_at time.

I am having a lot of trouble making the connection from these data points to the best way to meaningfully report on and present them to the end user.

I am using Rails 4.1 and Ruby 2.0.

Edit: SQL Fiddle - http://sqlfiddle.com/#!9/a72fe/1

like image 337
jackerman09 Avatar asked Sep 26 '22 22:09

jackerman09


1 Answers

This query would do it all (adapted to your added fiddle):

SELECT a.ts, g.*, round((a.ct * numeric '100') / g.capacity, 2) AS pct
FROM  (
   SELECT ts, c.garage_id, count(*) AS ct
   FROM   generate_series(timestamp '2015-06-01 00:00'  -- lower and
                        , timestamp '2015-12-01 00:00'  -- upper bound of range
                        , interval  '4h') ts
   JOIN   appointment a ON a.picked_up_at <= ts     -- incl. lower
                       AND (a.returned_at >  ts OR
                            a.returned_at IS NULL)  -- excl. upper bound
   JOIN   car c ON c.id = a.car_id
   GROUP  BY 1, 2
   ) a
JOIN   garage g ON g.id = a.garage_id
ORDER  BY 1, 2;

SQL Fiddle.

If returned_at IS NULL, this query assumes that the car is still in use. So NULL shouldn't occur for other cases or you have an error in the calculation.

First, I build the time series with the convenient generate_series() function.

Then join to appointments where the timestamp falls inside a booking.
I assume every appointment with including lower and excluding upper timestamp as it the widespread convention.

Aggregate and count before we join to garages (faster this way). Compare:

  • Aggregate a single column in query with many columns

Percent calculations in the outer SELECT.
I multiply the bigint number with numeric (or optionally real or float) to preserve fractional digits, which would be cut off in an integer division. Then I round to two fractional digits.

Note that this is not exactly the average percentage of each 4-hour period, but only the current percentage at each point in time, which is an approximation of the true average. You might start with an odd timestamp like '2015-06-01 01:17' so not to fall in between bookings that would probably turn over at full hours or something, which might increase the mean error of the approximation.

You can do an exact calculation for 4h periods, too, but that's more sophisticated. One simple technique would be to reduce the interval to 10 minutes or some granularity that's detailed enough to capture the full picture.

Related (with an example for exact calculation):

  • Calculate working hours between 2 dates in PostgreSQL
  • Average stock history table
like image 74
Erwin Brandstetter Avatar answered Sep 29 '22 00:09

Erwin Brandstetter