I am having a data frame
df <- data.frame(
"Quarter" = c("Q1 2019","Q1 2019","Q1 2019","Q2 2019","Q2 2019","Q2 2019","Q2 2019","Q3 2019","Q3 2019","Q3 2019","Q3 2019","Q4 2019","Q4 2019"),
"Name" = c("Ram","John","Jack","Ram","Rach","Will","John","Ram","Rach","Will","John","Rach","John"),
stringsAsFactors = FALSE
)
I need to calculate the number of persons who were added and left in each quarter by comparing it with the previous quarter.
The expected output is
quarterYear status Count
1 Q1 2019 Added 3
2 Q1 2019 Left 0
3 Q2 2019 Added 2
4 Q2 2019 Left 1
5 Q3 2019 Added 0
6 Q3 2019 Left 0
7 Q4 2019 Added 0
8 Q4 2019 Left 2
I am not sure of how to compare two groups and get the count.
How can I achieve the expected output in R?
Not sure about the speed implications, but a big part of this is essentially comparing consecutive counts, so diff
came to mind.
tab <- table(df$Quarter, df$Name)
tab <- rbind(tab[1,,drop=FALSE], diff(tab))
out <- rbind(added = rowSums(tab == 1), left = rowSums(tab == -1))
# Q1 2019 Q2 2019 Q3 2019 Q4 2019
#added 3 2 0 0
#left 0 1 0 2
If you need the long output specifically:
setNames(data.frame(as.table(out)), c("status","quarter","count"))
# status quarter count
#1 added Q1 2019 3
#2 left Q1 2019 0
#3 added Q2 2019 2
#4 left Q2 2019 1
#5 added Q3 2019 0
#6 left Q3 2019 0
#7 added Q4 2019 0
#8 left Q4 2019 2
Split to create a list and map the two lists to get the length of the 'uneaual' elements, i.e.
l1 <- split(df$Name, df$Quarter)
do.call(rbind, Map(function(x, y) { i1 <- length(setdiff(x, y));
i2 <- length(setdiff(y, x));
data.frame(Added = i1, Left = i2)},
l1[-1], l1[-length(l1)]))
# Added Left
#Q2 2019 2 1
#Q3 2019 0 0
#Q4 2019 0 2
You can tidy the output the way you want
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With