My goal is to generate a report showing the average occupancy of a garage (y-axis) at a given day of the week and/or time of day. My data model is as follows:
has_many
Cars and Garage has_many
Appointments, through: :cars
has_many
AppointmentsAlso, Garage has a field capacity (integer)
, which is the maximum number of cars that will fit in the garage.
If I have a list of Appointments spanning the last 6 months, and I would like to generate a line-graph with the x-axis showing each day of the week, broken down into 4-hour intervals, and the y-axis showing the average % occupancy (# of cars in the garage / capacity) over the 6 month period for the given day/hour interval, how can I go about gathering this data to report on?
E.g. a car is In
from the time of one Appointment's return until the next Appointment's pickup, and Out
from the Appointment's pickup until it's returned_at
time.
I am having a lot of trouble making the connection from these data points to the best way to meaningfully report on and present them to the end user.
I am using Rails 4.1 and Ruby 2.0.
Edit: SQL Fiddle - http://sqlfiddle.com/#!9/a72fe/1
This query would do it all (adapted to your added fiddle):
SELECT a.ts, g.*, round((a.ct * numeric '100') / g.capacity, 2) AS pct
FROM (
SELECT ts, c.garage_id, count(*) AS ct
FROM generate_series(timestamp '2015-06-01 00:00' -- lower and
, timestamp '2015-12-01 00:00' -- upper bound of range
, interval '4h') ts
JOIN appointment a ON a.picked_up_at <= ts -- incl. lower
AND (a.returned_at > ts OR
a.returned_at IS NULL) -- excl. upper bound
JOIN car c ON c.id = a.car_id
GROUP BY 1, 2
) a
JOIN garage g ON g.id = a.garage_id
ORDER BY 1, 2;
SQL Fiddle.
If returned_at IS NULL
, this query assumes that the car is still in use. So NULL shouldn't occur for other cases or you have an error in the calculation.
First, I build the time series with the convenient generate_series()
function.
Then join to appointments where the timestamp falls inside a booking.
I assume every appointment with including lower and excluding upper timestamp as it the widespread convention.
Aggregate and count before we join to garages (faster this way). Compare:
Percent calculations in the outer SELECT
.
I multiply the bigint
number with numeric
(or optionally real
or float
) to preserve fractional digits, which would be cut off in an integer division. Then I round to two fractional digits.
Note that this is not exactly the average percentage of each 4-hour period, but only the current percentage at each point in time, which is an approximation of the true average. You might start with an odd timestamp like '2015-06-01 01:17' so not to fall in between bookings that would probably turn over at full hours or something, which might increase the mean error of the approximation.
You can do an exact calculation for 4h periods, too, but that's more sophisticated. One simple technique would be to reduce the interval to 10 minutes or some granularity that's detailed enough to capture the full picture.
Related (with an example for exact calculation):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With