I have an events
table with two columns eventkey
(unique, primary-key) and createtime
, which stores the creation time of the event as the number of milliseconds since Jan 1 1970 in a NUMBER
column.
I would like to create a "histogram" or frequency distribution that shows me how many events were created in each hour of the past week.
Is this the best way to write such a query in Oracle, using the width_bucket()
function? Is it possible to derive the number of rows that fall into each bucket using one of the other Oracle analytic functions rather than using width_bucket
to determine what bucket number each row belongs to and doing a count(*)
over that?
-- 1305504000000 = 5/16/2011 12:00am GMT
-- 1306108800000 = 5/23/2011 12:00am GMT
select
timestamp '1970-01-01 00:00:00' + numtodsinterval((1305504000000/1000 + (bucket * 60 * 60)), 'second') period_start,
numevents
from (
select bucket, count(*) as events from (
select eventkey, createtime,
width_bucket(createtime, 1305504000000, 1306108800000, 24 * 7) bucket
from events
where createtime between 1305504000000 and 1306108800000
) group by bucket
)
order by period_start
Starting in Oracle Database 12 c, if the sampling size is the default of AUTO_SAMPLE_SIZE, then the database creates frequency histograms from a full table scan. For all other sampling percentage specifications, the database derives frequency histograms from a sample.
A histogram is designed to store data distribution information in the dictionary and helps the query optimizer to generate an efficient execution plan. In Oracle Database 12c, there are four types of histograms: frequency histogram (FH), top frequency histogram (TFH), hybrid histogram (HH), and the old-style height-balanced histogram (HBH).
This is determined based on the number of histogram buckets (NHB) and the number of endpoints that fall in the range of (max-min)/NHB. Oracle Database checks whether a column is distributed nonuniformly before creating a histogram.
Oracle Database Reference to learn about the USER_HISTOGRAMS view A top frequency histogram is a variation on a frequency histogram that ignores nonpopular values that are statistically insignificant. For example, if a pile of 1000 coins contains only a single penny, then you can ignore the penny when sorting the coins into buckets.
If your createtime
were a date column, this would be trivial:
SELECT TO_CHAR(CREATE_TIME, 'DAY:HH24'), COUNT(*)
FROM EVENTS
GROUP BY TO_CHAR(CREATE_TIME, 'DAY:HH24');
As it is, casting the createtime
column isn't too hard:
select TO_CHAR(
TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000),
'DAY:HH24') AS BUCKET, COUNT(*)
FROM EVENTS
WHERE createtime between 1305504000000 and 1306108800000
group by TO_CHAR(
TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000),
'DAY:HH24')
order by 1
If, alternatively, you're looking for the fencepost values (for example, where do I go from the first decile (0-10%) to the next (11-20%), you'd do something like:
select min(createtime) over (partition by decile) as decile_start,
max(createtime) over (partition by decile) as decile_end,
decile
from (select createtime,
ntile (10) over (order by createtime asc) as decile
from events
where createtime between 1305504000000 and 1306108800000
)
I'm unfamiliar with Oracle's date functions, but I'm pretty certain there's an equivalent way of writing this Postgres statement:
select date_trunc('hour', stamp), count(*)
from your_data
group by date_trunc('hour', stamp)
order by date_trunc('hour', stamp)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With