I have the following dataset, which is already sorted by transaction:
dataset <- data.frame(id = c(1,2,3,4,2,4,6,7,3,2),
transaction = c(1,2,3,4,5,6,7,8,9,10),
amount = c(200,100,50,100,50,300,100,50,100,50))
As you can see, each customer has an Id and the amount spent in the transaction.
My question is, how to identify if the customer is a new one in a transaction, or if it's recurrent. A new customer means that is its first transaction, and the next ones are recurrent.
recurrence_status <- c("new","new","new","new","recurrent",
"recurrent","new","new","recurrent","recurrent")
I have tried so far the following:
for (i in 1:(length(dataset$transaction)-1)){
for(j in 2:length(dataset$transaction)){
j <- j + 1
comp <- dataset[j:length(dataset$id)]
ifelse((is.element(dataset[i,1]),comp),"recurrent","new")
}
}
But it gives me an error due to the brackets. I know that using loops in R should be avoided when possible. Please, any help will be welcome.
Regards,
In base R
, this can be done with duplicated
dataset$recurrence_status <- c("new", "recurrent")[duplicated(dataset$id) + 1]
dataset$recurrence_status
#[1] "new" "new" "new" "new" "recurrent" "recurrent" "new" "new" "recurrent"
#[10] "recurrent"
Utilizing dplyr
:
dataset %>%
group_by(id) %>%
mutate(recurrence_status = factor(+(row_number() > 1),
levels = c(0, 1),
labels = c("new", "recurrent")))
id transaction amount recurrence_status
<dbl> <dbl> <dbl> <fct>
1 1 1 200 new
2 2 2 100 new
3 3 3 50 new
4 4 4 100 new
5 2 5 50 recurrent
6 4 6 300 recurrent
7 6 7 100 new
8 7 8 50 new
9 3 9 100 recurrent
10 2 10 50 recurrent
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With