Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

calculating simple retention in R

Tags:

r

dplyr

retention

For the dataset test, my objective is to find out how many unique users carried over from one period to the next on a period-by-period basis.

> test
   user_id period
1        1      1
2        5      1
3        1      1
4        3      1
5        4      1
6        2      2
7        3      2
8        2      2
9        3      2
10       1      2
11       5      3
12       5      3
13       2      3
14       1      3
15       4      3
16       5      4
17       5      4
18       5      4
19       4      4
20       3      4

For example, in the first period there were four unique users (1, 3, 4, and 5), two of which were active in the second period. Therefore the retention rate would be 0.5. In the second period there were three unique users, two of which were active in the third period, and so the retention rate would be 0.666, and so on. How would one find the percentage of unique users that are active in the following period? Any suggestions would be appreciated.

The output would be the following:

> output
  period retention
1      1        NA
2      2     0.500
3      3     0.666
4      4     0.500

The test data:

> dput(test)
structure(list(user_id = c(1, 5, 1, 3, 4, 2, 3, 2, 3, 1, 5, 5, 
2, 1, 4, 5, 5, 5, 4, 3), period = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)), .Names = c("user_id", "period"
), row.names = c(NA, -20L), class = "data.frame")
like image 203
the_darkside Avatar asked May 19 '17 20:05

the_darkside


People also ask

What is the formula for calculating retention?

To calculate the retention rate, divide the number of employees that stayed with your company through the entire time period by the number of employees you started with on day one. Then, multiply that number by 100 to get your employee retention rate.

How do you calculate 1 year retention rate?

Retention rate is often calculated on an annual basis, dividing the number of employees with one year or more of service by the number of staff in those positions one year ago.

How do you calculate 12 month retention?

To calculate retention rate, divide your active users that continue their subscriptions at the end of a given period by the total number of active users you had at the beginning of that time period.

How do you calculate weekly retention?

Find out how many customers you have at the end of a given period (week, month, or quarter). Subtract the number of new customers you've acquired over that time. Divide by the number of customers you had at the beginning of that period. Then, multiply that by one hundred.


2 Answers

How about this? First split the users by period, then write a function that calculates the proportion carryover between any two periods, then loop it through the split list with mapply.

splt <- split(test$user_id, test$period)

carryover <- function(x, y) {
    length(unique(intersect(x, y))) / length(unique(x))
}
mapply(carryover, splt[1:(length(splt) - 1)], splt[2:length(splt)])

        1         2         3 
0.5000000 0.6666667 0.5000000 
like image 138
Daniel Anderson Avatar answered Oct 18 '22 00:10

Daniel Anderson


Here is an attempt using dplyr, though it also uses some standard syntax in the summarise:

test %>% 
group_by(period) %>% 
summarise(retention=length(intersect(user_id,test$user_id[test$period==(period+1)]))/n_distinct(user_id)) %>% 
mutate(retention=lag(retention))

This returns:

period retention
   <dbl>     <dbl>
1      1        NA
2      2 0.5000000
3      3 0.6666667
4      4 0.5000000
like image 4
Lamia Avatar answered Oct 18 '22 00:10

Lamia