I have a vector of n-length strings of numerals that looks like this (in this case, n=3):
[1] "111" "111" "111" "111" "111" "111" "111" "111" "111" "111" "111" "111"
[13] "111" "111" "111" "111" "111" "111" "111" "111" "111" "111" "111" "111"
[25] "111" "111" "111" "111" "111" "111" "111" "1 1" "111" " 1" "111" "112"
[37] "121" "111" "111" "111" "11 " "111" " " "111" "111" "221" "111" "111"
[49] " " "111" "111"
I want to convert it into a matrix (or dataframe) that looks like this:
V1 V2 V3
1 1 1
1 1 1
1 1 1
...
1 NA 1
1 1 1
NA NA 1
etc.
I know I can do it in a doubly-nested loop with substring() and as.numeric(), but there must be a more R-like way to accomplish this. Can anyone offer a clue?
TIA.
You can use strsplit
. For example (assuming your vector is an object named x
):
y <- strsplit(x,"")
z <- lapply(y, as.numeric)
a <- do.call(rbind, z)
This will be faster than the above solution, but is less intuitive. sapply
simplifies to an array, but you have to transpose it because the dimensions are opposite of what you want.
a <- t(sapply(y, as.numeric))
Here's a comparison of the timings of the different methods proposed in the answers (so far):
x <- sample(c("111","1 1"," 1","112","121","11 "," ","221"), 1e5, TRUE)
f1 <- function(x) do.call(rbind, lapply(strsplit(x,""), as.numeric))
f2 <- function(x) t(sapply(strsplit(x,""), as.numeric))
f3 <- function(x) read.fwf(file=textConnection(x), widths=c(1,1,1))
library(rbenchmark)
benchmark(f1(x), f2(x), f3(x), replications=10, order="relative",
columns=c("test","replications","elapsed","relative"))
# test replications elapsed relative
# 2 f2(x) 10 5.072 1.000000
# 1 f1(x) 10 6.343 1.250591
# 3 f3(x) 10 119.892 23.638013
Here's a solution using read.fwf()
.
x <- c("111", " ", "221", " 1")
## "fwf" stands for "*f*ixed *w*idth *f*ormatted"
read.fwf(file = textConnection(x), widths = c(1,1,1))
# V1 V2 V3
# 1 1 1 1
# 2 NA NA NA
# 3 2 2 1
# 4 NA NA 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With