i am having a hard time to solve this problem. Im going to explain it the best on base of the example, lets have a look at the data below:
order type_a type_b type_c type_d
1 1 0 50 10 0
2 2 10 0 0 80
3 3 15 0 0 35
4 4 0 0 30 0
5 5 0 20 40 0
and dput
:
data <- structure(list(order = c(1, 2, 3, 4, 5), type_a = c(0, 10, 15,
0, 0), type_b = c(50, 0, 0, 0, 20), type_c = c(10, 0, 0, 30,
40), type_d = c(0, 80, 35, 0, 0)), .Names = c("order", "type_a",
"type_b", "type_c", "type_d"), row.names = c(NA, -5L), class = "data.frame")
We can see that the first column tells us the order number, the rest of the columns tell us what belongs to this order. Additionally if the type_* column is filled with 0, then it is not relevant.
I would like to create a new column that defines groups based on "order description" --> type_* columns. For example for order 1 & order 5, we have same type_* columns filled with no 0 values, therefore they belong to the same group, same is for order 2 & 3, etc.
My final result should look like this:
order type_a type_b type_c type_d group
1 1 0 50 10 0 group_1
2 2 10 0 0 80 group_2
3 3 15 10 0 35 group_2
4 4 0 0 30 0 group_3
5 5 0 20 40 0 group_1
Just for your information, in my real data i do have more then 4 columns type_* --> there is about 10-15 of them!
Thanks for help!
Your groups are defined by the presence or absence of 0 in columns 2 to 5. Test for 0, collapse the TRUE/FALSE values as a string and you get a unique string for each group. Convert to a factor:
> data$group = factor(apply(data[,2:5] ==0,1,paste,collapse=""))
> data
order type_a type_b type_c type_d group
1 1 0 50 10 0 TRUEFALSEFALSETRUE
2 2 10 0 0 80 FALSETRUETRUEFALSE
3 3 15 0 0 35 FALSETRUETRUEFALSE
4 4 0 0 30 0 TRUETRUEFALSETRUE
5 5 0 20 40 0 TRUEFALSEFALSETRUE
Now the names are ugly, so change the levels:
> class(data$group)
[1] "factor"
> levels(data$group)=paste("group_",1:length(levels(data$group)))
> data
order type_a type_b type_c type_d group
1 1 0 50 10 0 group_ 2
2 2 10 0 0 80 group_ 1
3 3 15 0 0 35 group_ 1
4 4 0 0 30 0 group_ 3
5 5 0 20 40 0 group_ 2
If all those capital TRUEFALSE hurts your eyes, a simple fix will make a neat binary number:
> data$group = factor(apply(0+(data[,2:5] ==0),1,paste,collapse=""))
> data
order type_a type_b type_c type_d group
1 1 0 50 10 0 1001
2 2 10 0 0 80 0110
3 3 15 0 0 35 0110
4 4 0 0 30 0 1101
5 5 0 20 40 0 1001
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With