Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by half hour interval

Tags:

sql

mysql

I was lucky enough to find this awesome piece of code on Stack Overflow, however I wanted to change it up so it showed each half hour instead of every hour, but messing around with it, only caused me to ruin the query haha.

This is the SQL:

SELECT CONCAT(HOUR(created_at), ':00-', HOUR(created_at)+1, ':00') as hours,
       COUNT(*)
FROM urls
GROUP BY HOUR(created_at)
ORDER BY HOUR(created_at) ASC

How would I go about getting a result every half an hour? :)

Another thing, is that, if it there is half an hour with no results, I would like it to return 0 instead of just skipping that step. It looks kinda of weird win I do statistics over the query, when it just skips an hour because there were none :P

like image 335
Jazerix Avatar asked Mar 09 '14 14:03

Jazerix


People also ask

What is the time interval for 1 day?

Definition. In CQL, a day is defined as a duration of any time interval which starts at a certain calendar day and ends at the next calendar day (1 second to 23 hours, 59 minutes, and 59 seconds).

Does Datepart work in MySQL?

Does Datepart work in MySQL? There is no DATEPART function in MySQL. Use MONTH(date_column) or EXTRACT(MONTH FROM date_column) instead.

Can you group by multiple columns in SQL?

We use SQL queries to group multiple columns of the database. The group by multiple columns is used to club together various records with similar (or the same) values for the specified columns.


1 Answers

If the format isn't too important, you can return two columns for the interval. You might even just need the start of the interval, which can be determined by:

date_format(created_at - interval minute(created_at)%30 minute, '%H:%i') as period_start

the alias can be used in GROUP BY and ORDER BY clauses. If you also need the end of the interval, you will need a small modification:

SELECT
  date_format(created_at - interval minute(created_at)%30 minute, '%H:%i') as period_start,
  date_format(created_at + interval 30-minute(created_at)%30 minute, '%H:%i') as period_end,
  COUNT(*)
FROM urls
GROUP BY period_start
ORDER BY period_start ASC;

Of course you can also concatenate the values:

SELECT concat_ws('-',
           date_format(created_at - interval minute(created_at)%30 minute, '%H:%i'),
           date_format(created_at + interval 30-minute(created_at)%30 minute, '%H:%i')
       ) as period,
       COUNT(*)
FROM urls
GROUP BY period
ORDER BY period ASC;

Demo: http://rextester.com/RPN50688

Another thing, is that, if it there is half an hour with no results, I would like it to return 0

If you use the result in a procedural language, you can initialize all 48 rows with zero in a loop and then "inject" the non-zero rows from the result.

However - If you need it to be done in SQL, you will need a table for a LEFT JOIN with at least 48 rows. That could be done inline with a "huge" UNION ALL statement, but (IMHO) it would be ugly. So I prefer to have sequence table with one integer column, which can be very usefull for reports. To create that table I usually use the information_schema.COLUMNS, since it is available on any MySQL server and has at least a couple of hundreds rows. If you need more rows - just join it with itself.

Now let's create that table:

drop table if exists helper_seq;
create table helper_seq (seq smallint auto_increment primary key)
    select null
    from information_schema.COLUMNS c1
       , information_schema.COLUMNS c2
    limit 100; -- adjust as needed

Now we have a table with integers from 1 to 100 (though right now you only need 48 - but this is for demonstration).

Using that table we can now create all 48 time intervals:

select time(0) + interval 30*(seq-1) minute as period_start,
       time(0) + interval 30*(seq)   minute as period_end
from helper_seq s
where s.seq <= 48;

We will get the following result:

period_start | period_end
    00:00:00 |   00:30:00
    00:30:00 |   01:00:00
...
   23:30:00  |   24:00:00

Demo: http://rextester.com/ISQSU31450

Now we can use it as a derived table (subquery in FROM clause) and LEFT JOIN your urls table:

select p.period_start, p.period_end, count(u.created_at) as cnt
from (
    select time(0) + interval 30*(seq-1) minute as period_start,
           time(0) + interval 30*(seq)   minute as period_end
    from helper_seq s
    where s.seq <= 48
) p
left join urls u
    on  time(u.created_at) >= p.period_start
    and time(u.created_at) <  p.period_end
group by p.period_start, p.period_end
order by p.period_start

Demo: http://rextester.com/IQYQ32927

Last step (if really needed) is to format the result. We can use CONCAT or CONCAT_WS and TIME_FORMAT in the outer select. The final query would be:

select concat_ws('-',
         time_format(p.period_start, '%H:%i'),
         time_format(p.period_end,   '%H:%i')
       ) as period,
       count(u.created_at) as cnt
from (
    select time(0) + interval 30*(seq-1) minute as period_start,
           time(0) + interval 30*(seq)   minute as period_end
    from helper_seq s
    where s.seq <= 48
) p
left join urls u
    on  time(u.created_at) >= p.period_start
    and time(u.created_at) <  p.period_end
group by p.period_start, p.period_end
order by p.period_start

The result would look like:

period      | cnt
00:00-00:30 |   1
00:30-01:00 |   0
...
23:30-24:00 |   3

Demo: http://rextester.com/LLZ41445

like image 154
Paul Spiegel Avatar answered Sep 21 '22 10:09

Paul Spiegel