Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL group by relative date range

Tags:

sql

mysql

I have a user table which has standard columns id and registered_date

For each week in the year (e.g. DATE_FORMAT '%x-%v' ), I want a count of users who signed up within the PAST 4 weeks of that week (including that week itself).

So for example, for week 2014-50, I want a count of users who registered in week 50, as well as week 49, 48 and 47.

Normally, to get a count of users registered in each week, I would use:

SELECT DATE_FORMAT(registered_date, '%x-%v'), count(*)
FROM user
GROUP BY DATE_FORMAT(registered_date, '%x-%v')

But of course, this doesn't include users who signed up in the previous 3 weeks as well.

Any idea how to modify the query accordingly?

like image 381
emkay Avatar asked May 03 '26 15:05

emkay


1 Answers

We're going to have to create a structured query to get this stuff.

First, we need a subquery that will yield a list of the starting days of each week in which a user registered. We need dates of the Mondays, because you're using %x-%v to get your week numbers.

To get the date of the Monday immediately preceding any DATETIME value, this expression does it.

DATE(registered_date) - INTERVAL WEEKDAY(registered_date) DAY

So this little subquery gets us the list of Mondays.

     SELECT DISTINCT DATE(registered_date) - 
                INTERVAL WEEKDAY(registered_date) DAY as monday
       FROM user

Next we need to nest this inside another query to get a row for each (overlapping) four week period for which we hope to summarize. Each row will have three columns in it: the first date of the period, the last+1 date of the period, and the period's identifier, for example '2013-52'.

    SELECT monday - INTERVAL 3 WEEK AS start,
           monday + INTERVAL 1 WEEK AS finish,
           DATE_FORMAT(monday, '%x-%v') AS week
      FROM (
            SELECT DISTINCT DATE(registered_date) - 
                       INTERVAL WEEKDAY(registered_date) DAY as monday
              FROM user
           ) AS wks

Cool. Now we have a table that we can join with the user table to extract which users signed up in which period. We can do that like so

 SELECT user.id, periods.week
   FROM user
   JOIN (  /* the subquery */
        ) AS periods ON user.registered_date >= periods.start
                    AND user.registered_date <  periods.finish

But we don't want that detail, instead we want the count, so we rewrite this as an aggregate query.

 SELECT periods.week, COUNT(*) 
   FROM user
   JOIN (  /* the subquery */
        ) AS periods ON user.registered_date >= periods.start
                    AND user.registered_date <  periods.finish
  GROUP BY periods.week
  ORDER BY periods.week

Putting it all together, here's the query.

 SELECT periods.week, COUNT(*) 
   FROM user
   JOIN ( 
          SELECT monday - INTERVAL 3 WEEK AS start,
                 monday + INTERVAL 1 WEEK AS finish,
                 DATE_FORMAT(monday, '%x-%v') AS week
            FROM (
                    SELECT DISTINCT DATE(registered_date) - 
                               INTERVAL WEEKDAY(registered_date) DAY as monday
                      FROM user
                 ) AS wks
        ) AS periods ON user.registered_date >= periods.start
                    AND user.registered_date <  periods.finish
  GROUP BY periods.week
  ORDER BY periods.week

This looks like a hairball, but notice that we've built it up like a sandwich, of fairly simple parts.

The trick to assigning users to the appropriate four-week periods is embedded in this ON condition for the join.

                     ON user.registered_date >= periods.start
                    AND user.registered_date <  periods.finish

Because the start and finish dates overlap, each user gets assigned to multiple four-week periods.

The other trick here is to use actual dates rather than week ids '2014-45' for the computations, because it's not possible, especially at year ends, to convert back from week id to date, and we want to use computations like date - INTERVAL 3 WEEK to compute start and end dates.

like image 120
O. Jones Avatar answered May 05 '26 05:05

O. Jones