I try to split a vector of strings into a data.frame object and for a fixed order this isn't a problem (e.g. like written here), but in my particular case the columns for the future data-frame are not complete in the string objects. This is how the output should look like for an toy input:
input <- c("an=1;bn=3;cn=45",
           "bn=3.5;cn=76",
           "an=2;dn=5")
res <- do.something(input)
> res
      an  bn  cn  dn
[1,]  1   3   45  NA
[2,]  NA  3.5 76  NA
[3,]  2   NA  NA  5
I am looking now for a function do.somethingthat can do that in a efficient way. My naive solution at the moment would be to loop over the input objects, strsplit those for ; then strsplit them again for = and then fill the data.frame result by result. 
Is there any way to do that more R-alike? I am afraid doing that element by element would take quite a long time for a long vector input.   
EDIT: Just for completeness, my naive solution looks like this:
  do.something <- function(x){
    temp <- strsplit(x,";")
    temp2 <- sapply(temp,strsplit,"=")
    ul.temp2 <- unlist(temp2)
    label <- sort(unique(ul.temp2[seq(1,length(ul.temp2),2)]))
    res <- data.frame(matrix(NA, nrow = length(x), ncol = length(label)))
    colnames(res) <- label
    for(i in 1:length(temp)){
      for(j in 1:length(label)){
        curInfo <- unlist(temp2[[i]])
        if(sum(is.element(curInfo,label[j]))>0){
          res[i,j] <- curInfo[which(curInfo==label[j])+1]
        }
      }
    }
    res
  }
EDIT2: Unfortunately my large input data looks like this (entries without '=' possible):
input <- c("an=1;bn=3;cn=45",
           "an;bn=3.5;cn=76",
           "an=2;dn=5")
so I cannot compare the given answers to my problem at hand. My naive solution for that is
do.something <- function(x){
    temp <- strsplit(x,";")
    tempNames <- sort(unique(sapply(strsplit(unlist(temp),"="),"[",1)))
    res <- data.frame(matrix(NA, nrow = length(x), ncol = length(tempNames)))
    colnames(res) <- tempNames
    for(i in 1:length(temp)){
      curSplit <- strsplit(unlist(temp[[i]]),"=")
      curNames <- sapply(curSplit,"[",1)
      curValues <- sapply(curSplit,"[",2)
      for(j in 1:length(tempNames)){
        if(is.element(colnames(res)[j],curNames)){
          res[i,j] <- curValues[curNames==colnames(res)[j]]
        }
      }
    }
    res
  }
                Here's another way which should work even given your edited data. Extract the column names and values from your input vector using regmatches, then run through each list element matching the values to the appropriate column names.
#  Get column names
tag <- regmatches( input , gregexpr( "[a-z]+" , input ) )
#  Get numbers including floating point, replace missing values with NA
val <- regmatches( input , gregexpr( "\\d+\\.?\\d?|(?<=[a-z]);" , input , perl = TRUE ) )
val <- lapply( val , gsub , pattern = ";" , replacement = NA )
#  Column names
nms <- unique( unlist(tag) )
#  Intermeidate matrices
ll <- mapply( cbind , val , tag )
#  Match to appropriate columns and coerce to data.frame
out <- data.frame( do.call( rbind , lapply( ll , function(x) x[ match( nms , x[,2] ) ]  ) ) )
names(out) <- nms
#    an   bn   cn   dn
#1    1    3   45 <NA>
#2 <NA>  3.5   76 <NA>
#3    2 <NA> <NA>    5
                        This is a kind of bad techniq but sometimes ept( eval parse text) is useful.
> library(plyr)
> rbind.fill(lapply(input, function(x) {l <- new.env(); eval(parse(text = x), envir=l); as.data.frame(as.list(l))}))
  an cn  bn dn
1  1 45 3.0 NA
2 NA 76 3.5 NA
3  2 NA  NA  5
Update
> z <- lapply(strsplit(input, ";"), 
+             function(x) {
+               e <- Filter(function(y) length(y)==2, strsplit(x, "="))
+               r <- data.frame(lapply(e, `[`, 2))
+               names(r) <- lapply(e, `[`, 1)
+               r
+             })
> rbind.fill(z)
    an   bn   cn   dn
1    1    3   45 <NA>
2 <NA>  3.5   76 <NA>
3    2 <NA> <NA>    5
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With