My data look like:
data <- matrix(c("1","install","2015-10-23 14:07:20.000000",
"2","install","2015-10-23 14:08:20.000000",
"3","install","2015-10-23 14:07:25.000000",
"3","sale","2015-10-23 14:08:20.000000",
"4","install","2015-10-23 14:07:20.000000",
"4","sale","2015-10-23 14:09:20.000000",
"4","sale","2015-10-23 14:11:20.000000"),
ncol=3, byrow=TRUE)
colnames(data) <- c("id","event","time")
I would like to add a fourth column, called label, in which I label accordingly each row on some values. In this case:
and so on up to n sales.
it should be finally look like:
data1 <- matrix(c("1","install","2015-10-23 14:07:20.000000","0",
"2","install","2015-10-23 14:08:20.000000","0",
"3","install","2015-10-23 14:07:25.000000","1",
"3","sale","2015-10-23 14:08:20.000000","1",
"4","install","2015-10-23 14:07:20.000000","2",
"4","sale","2015-10-23 14:09:20.000000","2",
"4","sale","2015-10-23 14:11:20.000000","2"),
ncol=4, byrow=TRUE)
It's not clear to me what's the best approach in R to create "labels" based on conditions... maybe dplyr::mutate
?
Updated to reflect "and so on up to n sales."-requirement.
A dplyr option could be:
library(dplyr)
data <- as.data.frame(data)
data %>%
group_by(id) %>%
mutate(label = if(n() == 1) 0 else as.numeric(sum(event == "sale")))
#Source: local data frame [7 x 4]
#Groups: id [4]
#
# id event time label
# (fctr) (fctr) (fctr) (dbl)
#1 1 install 2015-10-23 14:07:20.000000 0
#2 2 install 2015-10-23 14:08:20.000000 0
#3 3 install 2015-10-23 14:07:25.000000 1
#4 3 sale 2015-10-23 14:08:20.000000 1
#5 4 install 2015-10-23 14:07:20.000000 2
#6 4 sale 2015-10-23 14:09:20.000000 2
#7 4 sale 2015-10-23 14:11:20.000000 2
The data.table equivalent would be:
library(data.table)
data <- as.data.table(data) # or setDT(data) if it's already a data.frame
data[, label := if(.N == 1) 0 else as.numeric(sum(event == "sale")), by=id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With