I have a user table which has standard columns id and registered_date
For each week in the year (e.g. DATE_FORMAT '%x-%v' ), I want a count of users who signed up within the PAST 4 weeks of that week (including that week itself).
So for example, for week 2014-50, I want a count of users who registered in week 50, as well as week 49, 48 and 47.
Normally, to get a count of users registered in each week, I would use:
SELECT DATE_FORMAT(registered_date, '%x-%v'), count(*)
FROM user
GROUP BY DATE_FORMAT(registered_date, '%x-%v')
But of course, this doesn't include users who signed up in the previous 3 weeks as well.
Any idea how to modify the query accordingly?
We're going to have to create a structured query to get this stuff.
First, we need a subquery that will yield a list of the starting days of each week in which a user registered. We need dates of the Mondays, because you're using %x-%v to get your week numbers.
To get the date of the Monday immediately preceding any DATETIME value, this expression does it.
DATE(registered_date) - INTERVAL WEEKDAY(registered_date) DAY
So this little subquery gets us the list of Mondays.
SELECT DISTINCT DATE(registered_date) -
INTERVAL WEEKDAY(registered_date) DAY as monday
FROM user
Next we need to nest this inside another query to get a row for each (overlapping) four week period for which we hope to summarize. Each row will have three columns in it: the first date of the period, the last+1 date of the period, and the period's identifier, for example '2013-52'.
SELECT monday - INTERVAL 3 WEEK AS start,
monday + INTERVAL 1 WEEK AS finish,
DATE_FORMAT(monday, '%x-%v') AS week
FROM (
SELECT DISTINCT DATE(registered_date) -
INTERVAL WEEKDAY(registered_date) DAY as monday
FROM user
) AS wks
Cool. Now we have a table that we can join with the user table to extract which users signed up in which period. We can do that like so
SELECT user.id, periods.week
FROM user
JOIN ( /* the subquery */
) AS periods ON user.registered_date >= periods.start
AND user.registered_date < periods.finish
But we don't want that detail, instead we want the count, so we rewrite this as an aggregate query.
SELECT periods.week, COUNT(*)
FROM user
JOIN ( /* the subquery */
) AS periods ON user.registered_date >= periods.start
AND user.registered_date < periods.finish
GROUP BY periods.week
ORDER BY periods.week
Putting it all together, here's the query.
SELECT periods.week, COUNT(*)
FROM user
JOIN (
SELECT monday - INTERVAL 3 WEEK AS start,
monday + INTERVAL 1 WEEK AS finish,
DATE_FORMAT(monday, '%x-%v') AS week
FROM (
SELECT DISTINCT DATE(registered_date) -
INTERVAL WEEKDAY(registered_date) DAY as monday
FROM user
) AS wks
) AS periods ON user.registered_date >= periods.start
AND user.registered_date < periods.finish
GROUP BY periods.week
ORDER BY periods.week
This looks like a hairball, but notice that we've built it up like a sandwich, of fairly simple parts.
The trick to assigning users to the appropriate four-week periods is embedded in this ON condition for the join.
ON user.registered_date >= periods.start
AND user.registered_date < periods.finish
Because the start and finish dates overlap, each user gets assigned to multiple four-week periods.
The other trick here is to use actual dates rather than week ids '2014-45' for the computations, because it's not possible, especially at year ends, to convert back from week id to date, and we want to use computations like date - INTERVAL 3 WEEK to compute start and end dates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With