Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySql query histogram for time intervals data

I have an event input of this type

event user
event start
event end
event type

Inserted to MySql table, each in its own row with user+start as primary key.

I need to query an histogram for a type by time interval (say minute) counting events occurred on each time interval. something like:

SELECT count(*) as hits FROM events 
WHERE type="browsing" 
GROUP BY time_diff("2015-1-1" AND "2015-1-2") / 60 * second

but I could not find any way to do that in SQL besides writing code, any idea?

Sample data

user, start, end, type
1, 2015-1-1 12:00:00, 2015-1-1 12:03:59, browsing
2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, browsing
2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, eating
3, 2015-1-1 12:03:00, 2015-1-1 12:08:00, browsing

the result should look like this:

         ^
count    |
browsing |
users    |       *
         |       *  *  *  *
         | *  *  *  *  *  *  *  *
         --|--|--|--|--|--|--|--|--|--> minute
         0  1  2  3  4  5  6  7  8  9 
like image 522
moshe beeri Avatar asked Aug 16 '15 13:08

moshe beeri


1 Answers

You can do this using group by with the level that you want. Here is an example using the data you gave:

First the SQL to create the table and populate it. The ID column here isn't "needed" but it is recommended if the table will be large or have indexes on it.

CREATE TABLE `test`.`events` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `user` INT NULL,
  `start` DATETIME NULL,
  `end` DATETIME NULL,
  `type` VARCHAR(45) NULL,
  PRIMARY KEY (`id`));

INSERT INTO events (user, start, end, type) VALUES 
(1, '2015-1-1 12:00:00', '2015-1-1 12:03:59', 'browsing'),
(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'browsing'),
(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'eating'),
(3, '2015-1-1 12:03:00', '2015-1-1 12:08:00', 'browsing');

To get a list of ordered pairs of number of minutes duration to number of events:

The query can then be easily written using the timestampdiff fuction, as shown below:

SELECT 
    TIMESTAMPDIFF(MINUTE, start, end) as minutes,
    COUNT(*) AS numEvents
FROM
    test.events
GROUP BY TIMESTAMPDIFF(MINUTE, start, end)

The output:

minutes      numEvents
3            3
5            1

The first parameter in the select can be one of FRAC_SECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR.

Here are some more examples of queries you can do:

Events by hour (floor function is applied)

SELECT 
    TIMESTAMPDIFF(HOUR, start, end) as hours,
    COUNT(*) AS numEvents
FROM
    test.events
GROUP BY TIMESTAMPDIFF(HOUR, start, end)

**Events by hour with better formatting **

SELECT 
    CONCAT("<", TIMESTAMPDIFF(HOUR, start, end) + 1) as hours,
    COUNT(*) AS numEvents
FROM
    test.events
GROUP BY TIMESTAMPDIFF(HOUR, start, end)

You can group by a variety of options, but this should definitely get you started. Most plotting packages will allow you to specify arbitrary x y coordinates, so you don't need to worry about the missing values on the x axis.

To get a list of ordered pairs of number of events at a specific time (for logging): Note that this is left for reference.

Now for the queries. First you have to pick which item you want to use for the grouping. For example, a task might take more than a minute, so the start and end would be in different minutes. For all these examples, I am basing them off of the start time, since that is when the event actually took place.

To group event counts by minute, you can use a query like this:

SELECT 
     DATE_FORMAT(start, '%M %e, %Y %h:%i %p') as minute, 
     count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start), MINUTE(start);

Note how this groups by all the items, starting with year, going the minute. I also have the minute displayed as a label. The resulting output looks like this:

minute                      numEvents
January 1, 2015 12:00 PM    1
January 1, 2015 12:03 PM    3

This is data that you could then take using php and prepare it for display by one of the many graphing libraries out there, plotting the minute column on the x axis, and plotting the numEvents on the y axis.

Here are some more examples of queries you can do:

Events by hour

SELECT 
     DATE_FORMAT(start, '%M %e, %Y %h %p') as hour, 
     count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start);

Events by date

SELECT 
    DATE_FORMAT(start, '%M %e, %Y') as date, 
    count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start);

Events by month

SELECT 
    DATE_FORMAT(start, '%M %Y') as date, 
    count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start), MONTH(start);

Events by year

SELECT 
    DATE_FORMAT(start, '%Y') as date, 
    count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start);

I should also point out that if you have an index on the start column for this table, these queries will complete quickly, even with hundreds of millions of rows.

Hope this helps! Let me know if you have any other questions about this.

like image 199
techdude Avatar answered Oct 24 '22 09:10

techdude