Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count active users using login timestamp in MySQL

Tags:

sql

mysql

While preparing for an interview, I have come across an SQL question and I hope to get some insight as to how to better answer it.

Given timestamps, userid, how to determine the number of users who are active everyday in a week?

There's very little to it, but that's the question in front of me.

like image 963
user3362840 Avatar asked Sep 29 '15 09:09

user3362840


2 Answers

I'm going to demonstrate such an idea based on what makes most sense to me and the way I would reply if the question was presented same as here:

First, let's assume a data set as such, we will name the table logins:

+---------+---------------------+
| user_id |   login_timestamp   |
+---------+---------------------+
|       1 | 2015-09-29 14:05:05 |
|       2 | 2015-09-29 14:05:08 |
|       1 | 2015-09-29 14:05:12 |
|       4 | 2015-09-22 14:05:18 |
|   ...   |          ...        |
+---------+---------------------+

There may be other columns, but we don't mind those.

First of all we should determine the borders of that week, for that we can use ADDDATE(). Combined with the idea that today's date-today's week-day (MySQL's DAYOFWEEK()), is sunday's date.

For instance: If today is Wednesday the 10th, Wed - 3 = Sun, thus 10 - 3 = 7, and we can expect Sunday to be the 7th.

We can get WeekStart and WeekEnd timestamps this way:

SELECT
DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1-DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") WeekStart, 
DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7-DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59") WeekEnd;

Note: in PostgreSQL there's a DATE_TRUNC() function which returns the beginning of a specified time unit, given a date, such as week start, month, hour, and so on. But that's not available in MySQL.

Next, let's utilize WeekStart and weekEnd in order to clice our data set, in this example I'll just show how to filter, using hard coded dates:

SELECT *
FROM `logins`
WHERE login_timestamp BETWEEN '2015-09-29 14:05:07' AND '2015-09-29 14:05:13'

This should return our data set sliced, with only relevant results:

+---------+---------------------+
| user_id |   login_timestamp   |
+---------+---------------------+
|       2 | 2015-09-29 14:05:08 |
|       1 | 2015-09-29 14:05:12 |
+---------+---------------------+

We can then reduce our result set to only the user_ids, and filter out duplicates. then count, this way:

SELECT COUNT(DISTINCT user_id)
FROM `logins`
WHERE login_timestamp BETWEEN '2015-09-29 14:05:07' AND '2015-09-29 14:05:13'

DISTINCT will filter out duplicates, and count will return just the amount.

Combined, this becomes:

SELECT COUNT(DISTINCT user_id)
FROM `logins`
WHERE login_timestamp 
    BETWEEN DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") 
        AND DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59")

Replace CURDATE() with any timestamp in order to get that week's user login count.


But I need to break this down to days, I hear you cry. Of course! and this is how:

First, let's translate our over-informative timestamps to just the date data. We add DISTINCT because we don't mind the same user logging in twice the same day. we count users, not logins, right? (note we step back here):

SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d")
FROM `logins`

This yields:

+---------+-----------------+
| user_id | login_timestamp |
+---------+-----------------+
|       1 | 2015-09-29      |
|       2 | 2015-09-29      |
|       4 | 2015-09-22      |
|   ...   |        ...      |
+---------+-----------------+

This query, we will wrap with a second, in order to count appearances of every date:

SELECT `login_timestamp`, count(*) AS 'count'
FROM (SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d") AS `login_timestamp` FROM `logins`) `loginsMod`
GROUP BY `login_timestamp`

We use count and a grouping in order to get the list by date, which returns:

+-----------------+-------+
| login_timestamp | count |
+-----------------+-------+
| 2015-09-29      | 1     +
| 2015-09-22      | 2     +
+-----------------+-------+

And after all the hard work, both combined:

SELECT `login_timestamp`, COUNT(*)
FROM (
SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d") AS `login_timestamp`
FROM `logins`
WHERE login_timestamp BETWEEN DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") AND DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59")) `loginsMod`
GROUP BY `login_timestamp`;

Will give you a daily breakdown of logins per-day in this week. Again, replace CURDATE() to get a different week.


As for the users themselves who logged in, let's combine the same stuff in a different order:

SELECT `user_id`
FROM (
    SELECT `user_id`, COUNT(*) AS `login_count`
    FROM (
        SELECT DISTINCT `user_id`, DATE_FORMAT(`login_timestamp`, "%Y-%m-%d")
        FROM `logins`) `logins`
    GROUP BY `user_id`) `logincounts`
WHERE `login_count` > 6

I have two inner queries, the first is logins:

SELECT DISTINCT `user_id`, DATE_FORMAT(`login_timestamp`, "%Y-%m-%d")
FROM `logins`

Will provide the list of users, and the days when they logged in on, without duplicates.

Then we have logincounts:

SELECT `user_id`, COUNT(*) AS `login_count`
FROM `logins` -- See previous subquery.
GROUP BY `user_id`) `logincounts`

Will return the same list, with a count of how many logins each user had.

And lastly: SELECT user_id FROM logincounts -- See previous subquery. WHERE login_count > 6

Filtering our those who didn't login 7 times, and dropping the date column.


This kinda got long, but I think it's rife with ideas and I think it may definitely help answering in an interesting way in a work interview. :)

like image 192
Selfish Avatar answered Nov 15 '22 02:11

Selfish


create table fbuser(id integer, date date); 
insert into fbuser(id,date)values(1,'2012-01-01'); 
insert into fbuser(id,date)values(1,'2012-01-02'); 
insert into fbuser(id,date)values(1,'2012-01-01'); 
insert into fbuser(id,date)values(1,'2012-01-01'); 
insert into fbuser(id,date)values(1,'2012-01-01'); 
insert into fbuser(id,date)values(1,'2012-01-01'); 
insert into fbuser(id,date)values(1,'2012-01-02'); 
insert into fbuser(id,date)values(1,'2012-01-03'); 
insert into fbuser(id,date)values(1,'2012-01-04'); 
insert into fbuser(id,date)values(1,'2012-01-05'); 
insert into fbuser(id,date)values(1,'2012-01-06'); 
insert into fbuser(id,date)values(1,'2012-01-07');
insert into fbuser(id,date)values(4,'2012-01-08');
insert into fbuser(id,date)values(4,'2012-01-08');
insert into fbuser(id,date)values(1,'2012-01-08');
insert into fbuser(id,date)values(1,'2012-01-09');
select * from fbuser;
id |    date    
----+------------
 1 | 2012-01-01
 1 | 2012-01-02
 1 | 2012-01-01
 1 | 2012-01-01
 1 | 2012-01-01
 1 | 2012-01-01
 1 | 2012-01-02
 1 | 2012-01-03
 1 | 2012-01-04
 1 | 2012-01-05
 1 | 2012-01-06
 1 | 2012-01-07
 2 | 2012-01-07
 3 | 2012-01-07
 4 | 2012-01-07
 4 | 2012-01-08
 4 | 2012-01-08
 1 | 2012-01-08
 1 | 2012-01-09

select id,count(DISTINCT date) from fbuser 
where date BETWEEN '2012-01-01' and '2012-01-07' 
group by id having count(DISTINCT date)=7

 id | count 
----+-------
  1 |     7
(1 row)

Query counts unique dates logged in by user for the given period and returns id with 7 occurrences. If you have time also in your date you can use date_format.

like image 40
Dwipam Katariya Avatar answered Nov 15 '22 04:11

Dwipam Katariya