Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting one column into multiple columns

Tags:

r

I have a huge dataset in which there is one column including several values for each subject (row). Here is a simplified sample dataframe:

data <- data.frame(subject = c(1:8), sex = c(1, 2, 2, 1, 2, 1, 1, 2), 
              age = c(35, 29, 31, 46, 64, 57, 49, 58), 
              v1 = c("2", "0", "3,5", "2 1", "A,4", "B,1,C", "A and B,3", "5, 6 A or C"))

> data
  subject sex age          v1
1       1   1  35           2
2       2   2  29           0
3       3   2  31         3,5  # separated by a comma
4       4   1  46         2 1  # separated by a blank space
5       5   2  64         A,4
6       6   1  57       B,1,C
7       7   1  49   A and B,3
8       8   2  58 5, 6 A or C

I first want to remove the letters (A, B, A and B, …) in the fourth column (v1), and then split the fourth column into multiple columns just like this:

  subject sex age x1 x2 x3 x4 x5 x6
1       1   1  35  0  1  0  0  0  0        
2       2   2  29  0  0  0  0  0  0
3       3   2  31  0  0  1  0  1  0  
4       4   1  46  1  1  0  0  0  0
5       5   2  64  0  0  0  1  0  0
6       6   1  57  1  0  0  0  0  0
7       7   1  49  0  0  1  0  0  0
8       8   2  58  0  0  0  0  1  1

where the 1st subject takes 1 at x2 because it takes 2 at v1 in the original dataset, the 3rd subject takes 1 at both x3 and x5 because it takes 3 and 5 at v1 in the original dataset, and so on.

I would appreciate any help on this question. Thanks a lot.

like image 423
user187454 Avatar asked Feb 07 '13 20:02

user187454


1 Answers

You can cbind this result to data[-4] and get what you need:

 0+t(sapply(as.character(data$v1), function(line) 
        sapply(1:6, function(x) x %in% unlist(strsplit(line, split="\\s|\\,"))) ))
#----------------
            [,1] [,2] [,3] [,4] [,5] [,6]
2              0    1    0    0    0    0
0              0    0    0    0    0    0
3,5            0    0    1    0    1    0
2 1            1    1    0    0    0    0
A,4            0    0    0    1    0    0
B,1,C          1    0    0    0    0    0
A and B,3      0    0    1    0    0    0
5, 6 A or C    0    0    0    0    1    1
like image 68
IRTFM Avatar answered Oct 13 '22 18:10

IRTFM