I have the following dataset
#df
Factors Transactions
a,c 1
b 0
c 0
d,a 0
a 1
a 0
b 1
I'd like to know how many times we did not have a factor and we had a transaction. So, my desirable output is as follows:
#desired output
Factors count
a 1
b 2
c 2
d 3
For instance, only one time we didn't have a and we had a transaction (i.e. only in the last row).
There are many ways to know how many times we had each factor and we had transactions. For instance I tried this one:
library(data.table)
setDT(df)[, .(Factors = unlist(strsplit(as.character(Factors), ","))),
by = Transactions][,.(Transactions = sum(Transactions > 0)), by = Factors]
But I wish to count how many times we didn't have a factor and we had transaction.
Thanks in advance.
You can calculate the opposite, i.e, how many times the factor has a transaction and then the difference between the total transactions and transactions for each individual factor would be what you are looking for:
library(data.table)
total <- sum(df$Transactions > 0)
(setDT(df)[, .(Factors = unlist(strsplit(as.character(Factors), ","))), Transactions]
[, total - sum(Transactions > 0), Factors])
# Factors V1
#1: a 1
#2: c 2
#3: b 2
#4: d 3
We can also do this with cSplit
library(splitstackshape)
cSplit(df, "Factors", ',', 'long')[, sum(df$Transactions) - sum(Transactions>0), Factors]
# Factors V1
#1: a 1
#2: c 2
#3: b 2
#4: d 3
Or with dplyr/tidyr
library(dplyr)
library(tidyr)
separate_rows(df, Factors) %>%
group_by(Factors) %>%
summarise(count = sum(df$Transactions) - sum(Transactions>0))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With