For the data given below,
data1<-structure(list(var1 = c("2 7", "2 6 7", "2 7", "2 7", "1 7",
"1 7", "1 5", "1 2 7", "1 5", "1 7", "1 2 3 4 5 6 7", "1 2 4 6"
)), .Names = "var1", class = "data.frame", row.names = c(NA,
-12L))
> data1
var1
1 2 7
2 2 6 7
3 2 7
4 2 7
5 1 7
6 1 7
7 1 5
8 1 2 7
9 1 5
10 1 7
11 1 2 3 4 5 6 7
12 1 2 4 6
I would like it to split into seven columns (7) as follows:
v1 v2 v3 v4 v5 v6 v7
1 NA 2 NA NA NA NA 7
2 NA 2 NA NA NA 6 7
3 NA 2 NA NA NA NA 7
4 NA 2 NA NA NA NA 7
5 1 NA NA NA NA NA 7
6 1 NA NA NA NA NA 7
7 1 NA NA NA 5 NA NA
8 1 2 NA NA NA NA 7
9 1 NA NA NA 5 NA NA
10 1 NA NA NA NA NA 7
11 1 2 3 4 5 6 7
12 1 2 NA 4 NA 6 NA
I use the tstrsplit
from data.table
package as follows:
library(data.table)
setDT(data1)[,tstrsplit(var1," ")]
V1 V2 V3 V4 V5 V6 V7
1: 2 7 NA NA NA NA NA
2: 2 6 7 NA NA NA NA
3: 2 7 NA NA NA NA NA
4: 2 7 NA NA NA NA NA
5: 1 7 NA NA NA NA NA
6: 1 7 NA NA NA NA NA
7: 1 5 NA NA NA NA NA
8: 1 2 7 NA NA NA NA
9: 1 5 NA NA NA NA NA
10: 1 7 NA NA NA NA NA
11: 1 2 3 4 5 6 7
12: 1 2 4 6 NA NA NA
This is different than the expected output. I was wondering how can I get the expected output as described above.
With data.table
you may try
library(magrittr)
setDT(data1)[, strsplit(var1," "), by = .(rn = seq_len(nrow(data1)))] %>%
dcast(., rn ~ V1)
rn 1 2 3 4 5 6 7 1: 1 NA 2 NA NA NA NA 7 2: 2 NA 2 NA NA NA 6 7 3: 3 NA 2 NA NA NA NA 7 4: 4 NA 2 NA NA NA NA 7 5: 5 1 NA NA NA NA NA 7 6: 6 1 NA NA NA NA NA 7 7: 7 1 NA NA NA 5 NA NA 8: 8 1 2 NA NA NA NA 7 9: 9 1 NA NA NA 5 NA NA 10: 10 1 NA NA NA NA NA 7 11: 11 1 2 3 4 5 6 7 12: 12 1 2 NA 4 NA 6 NA
To get rid of the rn
column, we can use
setDT(data1)[, strsplit(var1," "), by = .(rn = 1:nrow(data1))][
, dcast(.SD, rn ~ V1)][, rn := NULL][]
setDT(data1)[, strsplit(var1," "), by = .(rn = seq_len(nrow(data1)))]
creates a data.table directly in long format
rn V1 1: 1 2 2: 1 7 3: 2 2 4: 2 6 5: 2 7 6: 3 2 7: 3 7 8: 4 2 9: 4 7 10: 5 1 11: 5 7 12: 6 1 13: 6 7 14: 7 1 15: 7 5 16: 8 1 17: 8 2 18: 8 7 19: 9 1 20: 9 5 21: 10 1 22: 10 7 23: 11 1 24: 11 2 25: 11 3 26: 11 4 27: 11 5 28: 11 6 29: 11 7 30: 12 1 31: 12 2 32: 12 4 33: 12 6 rn V1
which is then reshaped to wide format using dcast()
.
If we would use tstrsplit()
instead of strsplit()
we would get a data.table in wide format which needs to be reshaped to long format using melt()
:
setDT(data1)[,tstrsplit(var1," ")][, rn := .I][
, melt(.SD, id = "rn", na.rm = TRUE)][
, dcast(.SD, rn ~ paste0("V", value))][
, rn := NULL][]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With