Background
Here's a dataset, d
:
d <- data.frame(ID = c("a","a","b","b"),
product_code = c("B78","X31","C12","C12"),
stringsAsFactors=FALSE)
It looks like this:
The Problem and Desired Output
I'm trying to make an indicator column multiple_products
that's marked 1
for ID
s which have more than one unique product_code
and 0
for those that don't. Here's what I'm looking for:
My attempts haven't worked yet, though.
What I've Tried
Here's my current code:
d <- d %>%
group_by(ID) %>%
mutate(multiple_products = if_else(length(unique(d$product_code)) > 1, 1, 0)) %>%
ungroup()
And this is the result:
Any thoughts?
The d$
should be taken out as this will extract the whole column by removing the group attributes. Also, there is n_distinct
. In addition, there is no need for ifelse
or if_else
as logical values (TRUE/FALSE
) can be directly coerced to 1/0
as these are storage values by either using as.integer
or +
library(dplyr)
d %>%
group_by(ID) %>%
mutate(multiple_products = +(n_distinct(product_code) > 1)) %>%
ungroup()
-output
# A tibble: 4 x 3
ID product_code multiple_products
<chr> <chr> <int>
1 a B78 1
2 a X31 1
3 b C12 0
4 b C12 0
solution with data.table
;
library(data.table)
setDT(d)
d[,multiple_products:=rleid(product_code),by=ID][
,multiple_products:=ifelse(max(multiple_products)>1,1,0),by=ID]
d
output;
ID product_code multiple_products
<chr> <chr> <int>
1 a B78 1
2 a X31 1
3 b C12 0
4 b C12 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With