calculating simple retention in R

Tags:

For the dataset test, my objective is to find out how many unique users carried over from one period to the next on a period-by-period basis.

> test
   user_id period
1        1      1
2        5      1
3        1      1
4        3      1
5        4      1
6        2      2
7        3      2
8        2      2
9        3      2
10       1      2
11       5      3
12       5      3
13       2      3
14       1      3
15       4      3
16       5      4
17       5      4
18       5      4
19       4      4
20       3      4

For example, in the first period there were four unique users (1, 3, 4, and 5), two of which were active in the second period. Therefore the retention rate would be 0.5. In the second period there were three unique users, two of which were active in the third period, and so the retention rate would be 0.666, and so on. How would one find the percentage of unique users that are active in the following period? Any suggestions would be appreciated.

The output would be the following:

> output
  period retention
1      1        NA
2      2     0.500
3      3     0.666
4      4     0.500

The test data:

> dput(test)
structure(list(user_id = c(1, 5, 1, 3, 4, 2, 3, 2, 3, 1, 5, 5, 
2, 1, 4, 5, 5, 5, 4, 3), period = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)), .Names = c("user_id", "period"
), row.names = c(NA, -20L), class = "data.frame")

203

asked May 19 '17 20:05

the_darkside

2 Answers

How about this? First split the users by period, then write a function that calculates the proportion carryover between any two periods, then loop it through the split list with mapply.

splt <- split(test$user_id, test$period)

carryover <- function(x, y) {
    length(unique(intersect(x, y))) / length(unique(x))
}
mapply(carryover, splt[1:(length(splt) - 1)], splt[2:length(splt)])

        1         2         3 
0.5000000 0.6666667 0.5000000

138

answered Oct 18 '22 00:10

Daniel Anderson

Here is an attempt using dplyr, though it also uses some standard syntax in the summarise:

test %>% 
group_by(period) %>% 
summarise(retention=length(intersect(user_id,test$user_id[test$period==(period+1)]))/n_distinct(user_id)) %>% 
mutate(retention=lag(retention))

This returns:

period retention
   <dbl>     <dbl>
1      1        NA
2      2 0.5000000
3      3 0.6666667
4      4 0.5000000

answered Oct 18 '22 00:10

Lamia

Related questions
                            
                                R Plotly pie chart custom colors
                            
                                R: Emulating a complex form with httr
                            
                                R corrplot colors range
                            
                                geom_dotplot() loses dodge after applying colour aesthetics
                            
                                Understanding evaluation of input arguments of functions
                            
                                Changing scale of the ROC chart
                            
                                How do I use tagList() in a Shiny module?
                            
                                how to export tm object without chart borders
                            
                                Changing factor levels on a column with setattr is sensitive for how the column was created
                            
                                How to compute rowSums in rcpp
                            
                                named Element-wise operations in R
                            
                                Read table with comment lines starting with "##"
                            
                                get nearest data from dataframe in R [duplicate]
                            
                                Fill missing values in data.frame using dplyr complete within groups
                            
                                R - ggplot2 'dodge' geom_step() to overlap geom_bar()
                            
                                error with tidyr::gather() when I have unique names
                            
                                R: Apply function to matrix with elements of vector as argument
                            
                                Errors in makeCluster(multicore): cannot open the connection
                            
                                Adding column to sqlite database
                            
                                What R function to use for regex capture groups?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

calculating simple retention in R

Tags:

r

dplyr

retention

the_darkside

People also ask

2 Answers

Daniel Anderson

Lamia

Recent Activity

Donate For Us