Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill a dataset with 0s and 1s for values that match in row-column, in R? [duplicate]

Tags:

r

I've a dataset in a csv file that looks as follows:

 X               Colour Orange Red White Violet Black Yellow Blue
1 1          Orange, Red     NA  NA    NA     NA    NA     NA   NA
2 2                  Red     NA  NA    NA     NA    NA     NA   NA
3 3         White, Black     NA  NA    NA     NA    NA     NA   NA
4 4               Yellow     NA  NA    NA     NA    NA     NA   NA
5 5 Blue, Orange, Violet     NA  NA    NA     NA    NA     NA   NA

I'm trying to add 0s and 1s for every row-column match that occurs. The expected out put is:

      Colour     Orange Red White   Violet  Black   Yellow  Blue
1   Orange,Red   1       1    0        0      0        0      0
2   Red          0       1    0        0      0        0      0
3   White,Black  0       0    1        0      1        0      0
4   Yellow       0       0    0        0      0        1      0
5   Blue,Orange, 1       0    0        1      0        0      1
    Violet

How to achieve this in R?

like image 405
Sarde Avatar asked Jan 09 '23 08:01

Sarde


2 Answers

Loop across the column names, and check if they're in the pattern using grepl:

dat[-(1:2)] <-  sapply( colnames(dat[-(1:2)]), grepl, x=dat$Colour  ) + 0

#  X               Colour Orange Red White Violet Black Yellow Blue
#1 1          Orange, Red      1   1     0      0     0      0    0
#2 2                  Red      0   1     0      0     0      0    0
#3 3         White, Black      0   0     1      0     1      0    0
#4 4               Yellow      0   0     0      0     0      1    0
#5 5 Blue, Orange, Violet      1   0     0      1     0      0    1
like image 194
thelatemail Avatar answered Feb 08 '23 05:02

thelatemail


Not sure whether you added the NA columns or not. Even without having any identifier NA columns, we can use strsplit to split the "Colour" column, apply mtabulate on the list output and if needed, rearrange the output based on the column names of 'dat'

library(qdapTools)
cbind(dat[1:2], mtabulate(strsplit(dat$Colour, ', ')))[names(dat)]
#   X               Colour Orange Red White Violet Black Yellow Blue
#1 1          Orange, Red      1   1     0      0     0      0    0
#2 2                  Red      0   1     0      0     0      0    0
#3 3         White, Black      0   0     1      0     1      0    0
#4 4               Yellow      0   0     0      0     0      1    0
#5 5 Blue, Orange, Violet      1   0     0      1     0      0    1

or a similar approach would be to use cSplit_e from splitstackshape

library(splitstackshape)
cSplit_e(dat[1:2], 'Colour', type='character', fill=0)
like image 25
akrun Avatar answered Feb 08 '23 05:02

akrun