Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting new values not occuring earlier and not occuring in last group

I am trying to count number of unique "new" users per month. New is a user that has not appeared before (since the beginning) I am also trying to count number of unique users not appearing last month.

The original data looks like

library(dplyr)
    date <- c("2010-01-10","2010-02-13","2010-03-22","2010-01-11","2010-02-14","2010-03-23","2010-01-12","2010-02-14","2010-03-24")
    mth <- rep(c("2010-01","2010-02","2010-03"),3)
    user <- c("123","129","145","123","129","180","180","184","145")

    dt <- data.frame(date,mth,user)

    dt <- dt %>% arrange(date)

    dt

       date     mth user
1 2010-01-10 2010-01  123
2 2010-01-11 2010-01  123
3 2010-01-12 2010-01  180
4 2010-02-13 2010-02  129
5 2010-02-14 2010-02  129
6 2010-02-14 2010-02  184
7 2010-03-22 2010-03  145
8 2010-03-23 2010-03  180
9 2010-03-24 2010-03  145

The answer should look like

    new <- c(2,2,2,2,2,2,1,1,1)
    totNew <- c(2,2,2,4,4,4,5,5,5)
    notLastMonth <- c(2,2,2,2,2,2,2,2,2)

    tmp <- cbind(dt,new,totNew,notLastMonth)
    tmp

        date     mth user new totNew notLastMonth
1 2010-01-10 2010-01  123   2      2            2
2 2010-01-11 2010-01  123   2      2            2
3 2010-01-12 2010-01  180   2      2            2
4 2010-02-13 2010-02  129   2      4            2
5 2010-02-14 2010-02  129   2      4            2
6 2010-02-14 2010-02  184   2      4            2
7 2010-03-22 2010-03  145   1      5            2
8 2010-03-23 2010-03  180   1      5            2
9 2010-03-24 2010-03  145   1      5            2
like image 361
user3482393 Avatar asked Jan 09 '17 20:01

user3482393


1 Answers

Here's an attempt (explanations within the body of the code)

dt %>%
  group_by(user) %>%
  mutate(Count = row_number()) %>% # Count appearances per user
  group_by(mth) %>%
  mutate(new = sum(Count == 1)) %>% # Count first appearances per months
  summarise(new = first(new), # Summarise new users per month (for cumsum)
            users = list(unique(user))) %>% # Create a list of unique users per month (for notLastMonth)
  mutate(totNew = cumsum(new), # Calculate overall cummulative sum of unique users
         notLastMonth = lengths(Map(setdiff, users, lag(users)))) %>% # Compare new users to previous month
  select(-users) %>%
  right_join(dt) # Join back to the real data

# A tibble: 9 × 6
#       mth   new totNew notLastMonth       date   user
#    <fctr> <int>  <int>        <int>     <fctr> <fctr>
# 1 2010-01     2      2            2 2010-01-10    123
# 2 2010-01     2      2            2 2010-01-11    123
# 3 2010-01     2      2            2 2010-01-12    180
# 4 2010-02     2      4            2 2010-02-13    129
# 5 2010-02     2      4            2 2010-02-14    129
# 6 2010-02     2      4            2 2010-02-14    184
# 7 2010-03     1      5            2 2010-03-22    145
# 8 2010-03     1      5            2 2010-03-23    180
# 9 2010-03     1      5            2 2010-03-24    145
like image 185
David Arenburg Avatar answered Nov 15 '22 03:11

David Arenburg