Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a vector of strings to a dataframe or matrix

Tags:

r

I have a vector of n-length strings of numerals that looks like this (in this case, n=3):

[1] "111" "111" "111" "111" "111" "111" "111" "111" "111" "111" "111" "111"
[13] "111" "111" "111" "111" "111" "111" "111" "111" "111" "111" "111" "111"
[25] "111" "111" "111" "111" "111" "111" "111" "1 1" "111" "  1" "111" "112"
[37] "121" "111" "111" "111" "11 " "111" "   " "111" "111" "221" "111" "111"
[49] "   " "111" "111"

I want to convert it into a matrix (or dataframe) that looks like this:

V1   V2   V3
1    1    1
1    1    1
1    1    1
...
1   NA    1
1    1    1
NA   NA   1

etc.

I know I can do it in a doubly-nested loop with substring() and as.numeric(), but there must be a more R-like way to accomplish this. Can anyone offer a clue?

TIA.

like image 671
Stuart Avatar asked Oct 26 '12 00:10

Stuart


2 Answers

You can use strsplit. For example (assuming your vector is an object named x):

y <- strsplit(x,"")
z <- lapply(y, as.numeric)
a <- do.call(rbind, z)

This will be faster than the above solution, but is less intuitive. sapply simplifies to an array, but you have to transpose it because the dimensions are opposite of what you want.

a <- t(sapply(y, as.numeric))

Here's a comparison of the timings of the different methods proposed in the answers (so far):

x <- sample(c("111","1 1","  1","112","121","11 ","   ","221"), 1e5, TRUE)
f1 <- function(x) do.call(rbind, lapply(strsplit(x,""), as.numeric))
f2 <- function(x) t(sapply(strsplit(x,""), as.numeric))
f3 <- function(x) read.fwf(file=textConnection(x), widths=c(1,1,1))
library(rbenchmark)
benchmark(f1(x), f2(x), f3(x), replications=10, order="relative",
  columns=c("test","replications","elapsed","relative"))
#    test replications elapsed  relative
# 2 f2(x)           10   5.072  1.000000
# 1 f1(x)           10   6.343  1.250591
# 3 f3(x)           10 119.892 23.638013
like image 60
Joshua Ulrich Avatar answered Nov 14 '22 22:11

Joshua Ulrich


Here's a solution using read.fwf().

x <- c("111", "   ", "221", "  1")

## "fwf" stands for "*f*ixed *w*idth *f*ormatted"
read.fwf(file = textConnection(x), widths = c(1,1,1))
#   V1 V2 V3
# 1  1  1  1
# 2 NA NA NA
# 3  2  2  1
# 4 NA NA  1
like image 23
Josh O'Brien Avatar answered Nov 14 '22 23:11

Josh O'Brien