Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the new element added and removed from the previous group from a dataframe

Tags:

dataframe

r

dplyr

I am having a data frame

df <- data.frame(
  "Quarter" = c("Q1 2019","Q1 2019","Q1 2019","Q2 2019","Q2 2019","Q2 2019","Q2 2019","Q3 2019","Q3 2019","Q3 2019","Q3 2019","Q4 2019","Q4 2019"),
  "Name" = c("Ram","John","Jack","Ram","Rach","Will","John","Ram","Rach","Will","John","Rach","John"),
  stringsAsFactors = FALSE
) 

I need to calculate the number of persons who were added and left in each quarter by comparing it with the previous quarter.

The expected output is

quarterYear status Count
1    Q1 2019 Added   3
2    Q1 2019 Left    0
3    Q2 2019 Added   2
4    Q2 2019 Left    1
5    Q3 2019 Added   0
6    Q3 2019 Left    0
7    Q4 2019 Added   0
8    Q4 2019 Left    2 

I am not sure of how to compare two groups and get the count.

How can I achieve the expected output in R?

like image 753
Nevedha Ayyanar Avatar asked Jul 13 '20 06:07

Nevedha Ayyanar


2 Answers

Not sure about the speed implications, but a big part of this is essentially comparing consecutive counts, so diff came to mind.

tab <- table(df$Quarter, df$Name)
tab <- rbind(tab[1,,drop=FALSE], diff(tab))
out <- rbind(added = rowSums(tab == 1), left = rowSums(tab == -1))

#      Q1 2019 Q2 2019 Q3 2019 Q4 2019
#added       3       2       0       0
#left        0       1       0       2

If you need the long output specifically:

setNames(data.frame(as.table(out)), c("status","quarter","count"))
#  status quarter count
#1  added Q1 2019     3
#2   left Q1 2019     0
#3  added Q2 2019     2
#4   left Q2 2019     1
#5  added Q3 2019     0
#6   left Q3 2019     0
#7  added Q4 2019     0
#8   left Q4 2019     2
like image 148
thelatemail Avatar answered Sep 29 '22 10:09

thelatemail


Split to create a list and map the two lists to get the length of the 'uneaual' elements, i.e.

l1 <- split(df$Name, df$Quarter)
do.call(rbind, Map(function(x, y) { i1 <- length(setdiff(x, y)); 
                                    i2 <- length(setdiff(y, x)); 
                                    data.frame(Added = i1, Left = i2)},
          l1[-1], l1[-length(l1)]))

#        Added Left
#Q2 2019     2    1
#Q3 2019     0    0
#Q4 2019     0    2

You can tidy the output the way you want

like image 45
Sotos Avatar answered Sep 29 '22 10:09

Sotos