Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Calculating cumulative number of unique entries

Tags:

r

dplyr

I have a data frame from several experiments. I am looking to calculate cumulative number of unique values obtained after each successive experiment.

For example, consider:

test <- data.frame(exp = c( rep("exp1" , 4) , rep("exp2" , 4), rep("exp3" , 4) , rep("exp4" , 5) ) , 
                   entries = c("abcd","efgh","ijkl","mnop", "qrst" , "uvwx" , "abcd","efgh","ijkl" , "qrst" , "uvwx", 
                               "yzab" , "yzab" , "cdef" , "mnop" , "uvwx" , "ghij"))

> test
    exp entries
1  exp1    abcd
2  exp1    efgh
3  exp1    ijkl
4  exp1    mnop
5  exp2    qrst
6  exp2    uvwx
7  exp2    abcd
8  exp2    efgh
9  exp3    ijkl
10 exp3    qrst
11 exp3    uvwx
12 exp3    yzab
13 exp4    yzab
14 exp4    cdef
15 exp4    mnop
16 exp4    uvwx
17 exp4    ghij

total number of unique entries are nine. Now I want the result to look like:

   exp cum_unique_entries
1  exp1    4
2  exp2    6
3  exp3    7
4  exp4    9

Finally I want to plot this in the form of a barplot. I can do this with for loops approach, but feel there has to be more elegant way.

like image 825
ktyagi Avatar asked Dec 10 '22 08:12

ktyagi


1 Answers

Here's another solution with dplyr:

library(dplyr)

test %>%
  mutate(cum_unique_entries = cumsum(!duplicated(entries))) %>%
  group_by(exp) %>%
  slice(n()) %>%
  select(-entries)

or

test %>%
  mutate(cum_unique_entries = cumsum(!duplicated(entries))) %>%
  group_by(exp) %>%
  summarise(cum_unique_entries = last(cum_unique_entries))

Result:

# A tibble: 4 x 2
     exp cum_unique_entries
  <fctr>              <int>
1   exp1                  4
2   exp2                  6
3   exp3                  7
4   exp4                  9

Note:

First find the cumulative sum of all non-duplicates (cumsum(!duplicated(entries))), group_by exp, and take the last cumsum of each group, this number would be the cumulative unique entries for each group.

like image 75
acylam Avatar answered Dec 29 '22 00:12

acylam