Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of rows in 30 day bins

Each row in my table has a date time stamp, and I wish to query the database from now, to count how many rows are in the last 30 days, the 30 days before that and so on. Until there is a 30 day bin going back to the start of the table.

I have successfully carried out this query by using Python and making several queries. But I'm almost certain that it can be done in one single MySQL query.

like image 578
seanieb Avatar asked Dec 30 '12 10:12

seanieb


People also ask

How do I count all rows?

If you need a quick way to count rows that contain data, select all the cells in the first column of that data (it may not be column A). Just click the column header. The status bar, in the lower-right corner of your Excel window, will tell you the row count.

How do I count rows in SQL query?

The COUNT() function returns the number of rows that matches a specified criterion.

How do I count the number of rows in a SQL group?

To count the number of rows, use the id column which stores unique values (in our example we use COUNT(id) ). Next, use the GROUP BY clause to group records according to columns (the GROUP BY category above). After using GROUP BY to filter records with aggregate functions like COUNT, use the HAVING clause.

How do I count rows in MySQL?

To counts all of the rows in a table, whether they contain NULL values or not, use COUNT(*). That form of the COUNT() function basically returns the number of rows in a result set returned by a SELECT statement.


3 Answers

No stored procedures, temporary tables, only one query, and an efficient execution plan given an index on the date column:

select

  subdate(
    '2012-12-31',
    floor(dateDiff('2012-12-31', dateStampColumn) / 30) * 30 + 30 - 1
  ) as "period starting",

  subdate(
    '2012-12-31',
    floor(dateDiff('2012-12-31', dateStampColumn) / 30) * 30
  ) as "period ending",

  count(*)

from
  YOURTABLE
group by floor(dateDiff('2012-12-31', dateStampColumn) / 30);

It should be pretty obvious what is happening here, except for this incantation:

floor(dateDiff('2012-12-31', dateStampColumn) / 30)

That expression appears several times, and it evaluates to the number of 30-day periods ago dateStampColumn is. dateDiff returns the difference in days, divide it by 30 to get it in 30-day periods, and feed it all to floor() to round it to an integer. Once we have this number, we can GROUP BY it, and further we do a bit of math to translate this number back into the starting and ending dates of the period.

Replace '2012-12-31' with now() if you prefer. Here's some sample data:

CREATE TABLE YOURTABLE
    (`Id` int, `dateStampColumn` datetime);

INSERT INTO YOURTABLE
    (`Id`, `dateStampColumn`)
VALUES
    (1, '2012-10-15 02:00:00'),
    (1, '2012-10-17 02:00:00'),
    (1, '2012-10-30 02:00:00'),
    (1, '2012-10-31 02:00:00'),
    (1, '2012-11-01 02:00:00'),
    (1, '2012-11-02 02:00:00'),
    (1, '2012-11-18 02:00:00'),
    (1, '2012-11-19 02:00:00'),
    (1, '2012-11-21 02:00:00'),
    (1, '2012-11-25 02:00:00'),
    (1, '2012-11-25 02:00:00'),
    (1, '2012-11-26 02:00:00'),
    (1, '2012-11-26 02:00:00'),
    (1, '2012-11-24 02:00:00'),
    (1, '2012-11-23 02:00:00'),
    (1, '2012-11-28 02:00:00'),
    (1, '2012-11-29 02:00:00'),
    (1, '2012-11-30 02:00:00'),
    (1, '2012-12-01 02:00:00'),
    (1, '2012-12-02 02:00:00'),
    (1, '2012-12-15 02:00:00'),
    (1, '2012-12-17 02:00:00'),
    (1, '2012-12-18 02:00:00'),
    (1, '2012-12-19 02:00:00'),
    (1, '2012-12-21 02:00:00'),
    (1, '2012-12-25 02:00:00'),
    (1, '2012-12-25 02:00:00'),
    (1, '2012-12-26 02:00:00'),
    (1, '2012-12-26 02:00:00'),
    (1, '2012-12-24 02:00:00'),
    (1, '2012-12-23 02:00:00'),
    (1, '2012-12-31 02:00:00'),
    (1, '2012-12-30 02:00:00'),
    (1, '2012-12-28 02:00:00'),
    (1, '2012-12-28 02:00:00'),
    (1, '2012-12-30 02:00:00');

And the result:

period starting     period ending   count(*)
2012-12-02          2012-12-31      17
2012-11-02          2012-12-01      14
2012-10-03          2012-11-01      5

period endpoints are inclusive.

Play with this in SQL Fiddle.

There's a bit of potential goofiness in that any 30 day period with zero matching rows will not be included in the result. If you could join this against a table of periods, that could be eliminated. However, MySQL doesn't have anything like PostgreSQL's generate_series(), so you'd have to deal with it in your application or try this clever hack.

like image 130
Phil Frost Avatar answered Oct 02 '22 05:10

Phil Frost


If you just need to count intervals where there's at least one row, you could use this:

select
  datediff(curdate(), `date`) div 30 as block,
  count(*) as rows_per_block
from
  your_table
group by
  block

And this also shows the start date and the end date:

select
  datediff(curdate(), d) div 30 as block,
  date_sub(curdate(),
           INTERVAL (datediff(curdate(), `date`) div 30)*30 DAY) as start_block,
  date_sub(curdate(),
           INTERVAL (1+datediff(curdate(), `date`) div 30)*30-1 DAY) as end_block,
  count(*)
from your_table
group by block

but if you also need to show all intervals, you could use a solution like this:

select
  num,
  date_sub(curdate(),
           INTERVAL (num+1)*30-1 DAY) as start_block,
  date_sub(curdate(),
           INTERVAL num*30 DAY) as end_block,
  count(`date`)
from
  numbers left join your_table
  on `date` between date_sub(curdate(),
           INTERVAL (num+1)*30-1 DAY)  and
  date_sub(curdate(),
           INTERVAL num*30 DAY)
where num<=(datediff(curdate(), (select min(`date`) from your_table) ) div 30)
group by num

but this requires that you have a numbers table already prepared, or see fiddle here for a solution without numbers table.

like image 27
fthiella Avatar answered Oct 02 '22 05:10

fthiella


Try this:

SELECT 
  DATE_FORMAT(t1.`Date`, '%Y-%m-%d'),
  COUNT(t2.Id)
FROM 
(
  SELECT SUBDATE(CURDATE(), ID) `Date`
  FROM
  (
    SELECT  t2.digit * 10 + t1.digit + 1 AS id
    FROM         TEMP AS t1
    CROSS JOIN TEMP AS t2
  ) t 
  WHERE Id <= 30 
) t1
LEFT JOIN YOURTABLE t2 ON DATE(t1.`Date`) = DATE(t2.dateStampColumn)
GROUP BY t1.`Date`;

SQL Fiddle Demo

But, you will need to create a temp table Temp like so:

CREATE TABLE TEMP 
(Digit int);
INSERT INTO Temp VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
like image 45
Mahmoud Gamal Avatar answered Oct 02 '22 06:10

Mahmoud Gamal