I have a set of call detail records, and from those records, I'm supposed to determine the average concurrent active calls per system, per hour (at a precision of one minute). If I query 7pm to 8pm, I should see the average concurrent calls for the hour (averaging the concurrent calls for each minute) within that hour (for each system).
So, I need a way to check for a count of active calls for 7:00-7:01, 7:01-7:02, etc then average those numbers. A call is considered active if the call's time and duration fall within the current minute being checked.
What makes this even more difficult is that it needs to span SQL 7.0 and SQL 2000 (some functions in 2000 aren't available in 7.0, such as GetUTCTime()), if I can just get 2000 working I'll be happy.
I thought about looping through minutes (60) in the hour being checked and adding the count of calls that fall between that minute and then somehow cross referencing the duration to make sure that a call that starts at 7:00 pm and has a duration of 300 seconds shows active at 7:04, but I can't imagine how to approach the problem. I tried to figure out a way to weight each call against particular minute that would tell me if the call was active during that minute or not, but couldn't come up with an effective solution.
The data types here are the same as I have to query against. I don't have any control over the schema (other than possibly converting the data and inserting into another table with more appropriate data types). I've provided some example data that I know has concurrent active calls.
CREATE TABLE Records(
seconds char(10),
time char(4),
date char(8),
dur int,
system int,
port int,
)
--seconds is an stime value. It's the difference of seconds from UTC 1/1/1970 00:00:00 to the current UTC time, we use it as an identifier (like epoch).
--time is the time the call was made.
--date is the day the call was made.
--dur is the duration of the call in seconds.
--system is the system number.
--port is the port on the system (not particularly relevant for this question).
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924228','1923','20090416',105,2,2)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239923455','1910','20090416',884,1,97)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924221','1923','20090416',116,2,15)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924259','1924','20090416',90,1,102)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239923458','1910','20090416',891,2,1)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924255','1924','20090416',99,2,42)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924336','1925','20090416',20,2,58)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924293','1924','20090416',64,2,41)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239923472','1911','20090416',888,2,27)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924347','1925','20090416',25,1,100)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924301','1925','20090416',77,2,55)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924332','1925','20090416',52,2,43)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924240','1924','20090416',151,1,17)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924313','1925','20090416',96,2,62)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924094','1921','20090416',315,2,16)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239923643','1914','20090416',788,2,34)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924447','1927','20090416',6,2,27)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924342','1925','20090416',119,2,15)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924397','1926','20090416',76,2,41)
INSERT INTO Records(seconds, time, date, dur, system, port) VALUES('1239924457','1927','20090416',23,2,27)
I think MarkusQ has the answer, but let me develop an alternative that you may find easier to use. I'll use my customary method of developing this as a series of simple transformations in views, an analogue of functional decomposition in a procedural language.
First, let's put everything in common units. Recall that record
's column s
is seconds since the epoch, midnight 1 January 1970. We can find the number of seconds since midnight of the day of the call, that call occurred, by just taking s modulus the number of seconds in a day: s % (60 * 60 * 24)
.
select *,
s % (60 * 60 * 24) as start_secs_from_midnight,
s % (60 * 60 * 24) + dur - 1 as end_secs_from_midnight,
;
We subtract one from s + dur
because a one second call that starts at 12:00:00 also ends on 12:00:00.
We can find minutes since midnight by dividing those results by 60, or just by floor( s / 60 ) % (60 * 24)
:
create view record_mins_from_midnight as
select *,
floor( s / 60 ) % (60 * 24) as start_mins_fm,
floor( ( s + dur - 1) / 60 ) % (60 * 24) as end_mins_fm
from record
;
Now we create a table of minutes. We need 1440 of them, numbered from 0 to 1439. In databases that don't support arbitrary sequences, I create an artificial range or sequence like this:
create table artificial_range (
id int not null primary key auto_increment, idz int) ;
insert into artificial_range(idz) values (0);
-- repeat next line to double rows
insert into artificial_range(idz) select idz from artificial_range;
So to create a minute
table:
create view minute as
select id - 1 as active_minute
from artificial_range
where id <= 1440
;
Now we just join minute
to our record view
create view record_active_minutes as
select * from minutes a
join record_mins_from_midnight b
on (a.active_minute >= b.start_mins_fm
and a.active_minute <= b.end_mins_fm
;
This just cross products/multiplies record rows, so we have one record row for each whole minute over which the call was active.
Note that I'm doing this by defining active as "(part of) the call occurred during a minute". That is, a two second call that starts at 12:00:59 and ends at 12:01:01 by this definition occurs during two different minutes, but a two second call that starts at 12:00:58 and ends at 12:00:59 occurs during one minute.
I did that because you specified "So, I need a way to check for a count of active calls for 7:00-7:01, 7:01-7:02". If you prefer to consider only calls lasting more than sixty seconds to occur in more than one minute, you'll need to adjust the join.
Now if we want to find the number of active records for any granularity equal to or larger than minute granularity, we just group on that last view. To find average calls per hour we divide by 60 to turn minutes to hours:
select floor( active_minute / 60 ) as hour,
count(*) / 60 as avg_concurent_calls_per_minute_for_hour
from record_active_minutes
group by floor( active_minute / 60 ) ;
Note that that is the average per hour for all calls, over all days; if we want to limit it to a particular day or range of days, we'd add a where
clause.
But wait, there's more!
If we create a version of record_active_minutes
that does a left outer join, we can get a report that shows the average over all hours in the day:
create view record_active_minutes_all as
select *
from
minutes a
left outer join record_mins_from_midnight b
on (a.active_minute >= b.start_mins_fm
and a.active_minute <= b.end_mins_fm)
;
Then we again do our select, but against the new view:
select floor( active_minute / 60 ) as hour,
count(*) / 60 as avg_concurent_calls_per_min
from record_active_minutes_all
group by floor( active_minute / 60 ) ;
+------+------------------------------+
| hour | avg_concurrent_calls_per_min |
+------+------------------------------+
| 0 | 0.0000 |
| 1 | 0.0000 |
| 2 | 0.0000 |
| 3 | 0.0000 |
etc....
We can also index into this with a where. Unfortunately, the join means we'll have null values for the underlying record
table where no calls exist for a particular hour, e.g.,
select floor( active_minute / 60 ) as hour,
count(*) / 60 as avg_concurent_calls_per_min
from record_active_minutes_all
where month(date) = 1 and year(date) = 2008
group by floor( active_minute / 60 ) ;
will bring back no rows for hours in which no calls occurred. If we still want our "report-like" view that shows all hours, we make sure we also include those hours with no records:
select floor( active_minute / 60 ) as hour,
count(*) / 60 as avg_concurent_calls_per_minute_for_hour
from record_active_minutes_all
where (month(date) = 1 and year(date) = 2008)
or date is null
group by floor( active_minute / 60 ) ;
Note that in the last two examples, I'm using a SQL date (to which the functions month
and year
can be applied), not the char(4) date in your record table.
Which brings up another point: both the date and time in your record table are superfluous and denormalized, as each can be derived from your column s. Leaving them in the table allows the possibility of inconsistent rows, in which date(s) <> date
or time(s) <> time
. I'd prefer to do it like this:
create table record ( id int not null primary key, s, duration) ;
create view record_date as
select *, dateadd( ss, s, '1970-01-01') as call_date
from record
;
In the dateadd
function, the ss
is an enumerated type that tells the function to add seconds; s
is the column in record.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With