<p>I have a list, with each element being a character vector, of differing lengths I would like to bind the data as rows, so that the column names 'line up' and if there is extra data then create column and if there is missing data then create NAs</p> <p>Below is a mock example of the data I am working with</p> <pre class="prettyprint"><code>x <- list() x[[1]] <- letters[seq(2,20,by=2)] names(x[[1]]) <- LETTERS[c(1:length(x[[1]]))] x[[2]] <- letters[seq(3,20, by=3)] names(x[[2]]) <- LETTERS[seq(3,20, by=3)] x[[3]] <- letters[seq(4,20, by=4)] names(x[[3]]) <- LETTERS[seq(4,20, by=4)] </code></pre> <p>The below line would normally be what I would do if I was sure that the format for each element was the same...</p> <pre class="prettyprint"><code>do.call(rbind,x) </code></pre> <p>I was hoping that someone had come up with a nice little solution that matches up the column names and fills in blanks with <code>NA</code>s whilst adding new columns if in the binding process new columns are found...</p>

<p><code>rbind.fill</code> is an awesome function that does really well on list of data.frames. But IMHO, for this case, it could be done much faster when the list contains only (named) vectors.</p> <h3>The <code>rbind.fill</code> way</h3> <pre class="prettyprint"><code>require(plyr) rbind.fill(lapply(x,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)})) </code></pre> <h3>A more straightforward way (and efficient for this scenario at least):</h3> <pre class="prettyprint"><code>rbind.named.fill <- function(x) { nam <- sapply(x, names) unam <- unique(unlist(nam)) len <- sapply(x, length) out <- vector("list", length(len)) for (i in seq_along(len)) { out[[i]] <- unname(x[[i]])[match(unam, nam[[i]])] } setNames(as.data.frame(do.call(rbind, out), stringsAsFactors=FALSE), unam) } </code></pre> <p>Basically, we get total <em>unique names</em> to form the columns of the final data.frame. Then, we create a list with length = input and just fill the rest of the values with <code>NA</code>. This is probably the "trickiest" part as we've to match the names while filling NA. And then, we set names once finally to the columns (which can be set by reference using <code>setnames</code> from <code>data.table</code> package as well if need be).</p> <hr> <p>Now to some benchmarking:</p> <h3>Data:</h3> <pre class="prettyprint"><code># generate some huge random data: set.seed(45) sample.fun <- function() { nam <- sample(LETTERS, sample(5:15)) val <- sample(letters, length(nam)) setNames(val, nam) } ll <- replicate(1e4, sample.fun()) </code></pre> <h3>Functions:</h3> <pre class="prettyprint"><code># plyr's rbind.fill version: rbind.fill.plyr <- function(x) { rbind.fill(lapply(x,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)})) } rbind.named.fill <- function(x) { nam <- sapply(x, names) unam <- unique(unlist(nam)) len <- sapply(x, length) out <- vector("list", length(len)) for (i in seq_along(len)) { out[[i]] <- unname(x[[i]])[match(unam, nam[[i]])] } setNames(as.data.frame(do.call(rbind, out), stringsAsFactors=FALSE), unam) } </code></pre> <h3>Update (added GSee's function as well):</h3> <pre class="prettyprint"><code>foo <- function (...) { dargs <- list(...) all.names <- unique(names(unlist(dargs))) out <- do.call(rbind, lapply(dargs, `[`, all.names)) colnames(out) <- all.names as.data.frame(out, stringsAsFactors=FALSE) } </code></pre> <h3>Benchmarking:</h3> <pre class="prettyprint"><code>require(microbenchmark) microbenchmark(t1 <- rbind.named.fill(ll), t2 <- rbind.fill.plyr(ll), t3 <- do.call(foo, ll), times=10) identical(t1, t2) # TRUE identical(t1, t3) # TRUE Unit: milliseconds expr min lq median uq max neval t1 <- rbind.named.fill(ll) 243.0754 258.4653 307.2575 359.4332 385.6287 10 t2 <- rbind.fill.plyr(ll) 16808.3334 17139.3068 17648.1882 17890.9384 18220.2534 10 t3 <- do.call(foo, ll) 188.5139 204.2514 229.0074 339.6309 359.4995 10 </code></pre>

do.call(rbind, list) for uneven number of column

Tags:

I have a list, with each element being a character vector, of differing lengths I would like to bind the data as rows, so that the column names 'line up' and if there is extra data then create column and if there is missing data then create NAs

Below is a mock example of the data I am working with

x <- list()
x[[1]] <- letters[seq(2,20,by=2)]
names(x[[1]]) <- LETTERS[c(1:length(x[[1]]))]
x[[2]] <- letters[seq(3,20, by=3)]
names(x[[2]]) <- LETTERS[seq(3,20, by=3)]
x[[3]] <- letters[seq(4,20, by=4)]
names(x[[3]]) <- LETTERS[seq(4,20, by=4)]

The below line would normally be what I would do if I was sure that the format for each element was the same...

do.call(rbind,x)

I was hoping that someone had come up with a nice little solution that matches up the column names and fills in blanks with NAs whilst adding new columns if in the binding process new columns are found...

967

asked Jun 25 '13 22:06

h.l.m

2 Answers

rbind.fill is an awesome function that does really well on list of data.frames. But IMHO, for this case, it could be done much faster when the list contains only (named) vectors.

The `rbind.fill` way

require(plyr)
rbind.fill(lapply(x,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))

A more straightforward way (and efficient for this scenario at least):

rbind.named.fill <- function(x) {
    nam <- sapply(x, names)
    unam <- unique(unlist(nam))
    len <- sapply(x, length)
    out <- vector("list", length(len))
    for (i in seq_along(len)) {
        out[[i]] <- unname(x[[i]])[match(unam, nam[[i]])]
    }
    setNames(as.data.frame(do.call(rbind, out), stringsAsFactors=FALSE), unam)
}

Basically, we get total unique names to form the columns of the final data.frame. Then, we create a list with length = input and just fill the rest of the values with NA. This is probably the "trickiest" part as we've to match the names while filling NA. And then, we set names once finally to the columns (which can be set by reference using setnames from data.table package as well if need be).

Now to some benchmarking:

Data:

# generate some huge random data:
set.seed(45)
sample.fun <- function() {
    nam <- sample(LETTERS, sample(5:15))
    val <- sample(letters, length(nam))
    setNames(val, nam)  
}
ll <- replicate(1e4, sample.fun())

Functions:

# plyr's rbind.fill version:
rbind.fill.plyr <- function(x) {
    rbind.fill(lapply(x,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
}

rbind.named.fill <- function(x) {
    nam <- sapply(x, names)
    unam <- unique(unlist(nam))
    len <- sapply(x, length)
    out <- vector("list", length(len))
    for (i in seq_along(len)) {
        out[[i]] <- unname(x[[i]])[match(unam, nam[[i]])]
    }
    setNames(as.data.frame(do.call(rbind, out), stringsAsFactors=FALSE), unam)
}

Update (added GSee's function as well):

foo <- function (...) 
{
  dargs <- list(...)
  all.names <- unique(names(unlist(dargs)))
  out <- do.call(rbind, lapply(dargs, `[`, all.names))
  colnames(out) <- all.names
  as.data.frame(out, stringsAsFactors=FALSE)
}

Benchmarking:

require(microbenchmark)
microbenchmark(t1 <- rbind.named.fill(ll), 
               t2 <- rbind.fill.plyr(ll), 
               t3 <- do.call(foo, ll), times=10)
identical(t1, t2) # TRUE
identical(t1, t3) # TRUE

Unit: milliseconds
                       expr        min         lq     median         uq        max neval
 t1 <- rbind.named.fill(ll)   243.0754   258.4653   307.2575   359.4332   385.6287    10
  t2 <- rbind.fill.plyr(ll) 16808.3334 17139.3068 17648.1882 17890.9384 18220.2534    10
     t3 <- do.call(foo, ll)   188.5139   204.2514   229.0074   339.6309   359.4995    10

191

answered Sep 20 '22 11:09

Arun

If you want the result to be a matrix...

I recently wrote this function for a co-worker that wanted to rbind vectors into a matrix.

foo <- function (...) 
{
  dargs <- list(...)
  if (!all(vapply(dargs, is.vector, TRUE))) 
      stop("all inputs must be vectors")
  if (!all(vapply(dargs, function(x) !is.null(names(x)), TRUE))) 
      stop("all input vectors must be named.")
  all.names <- unique(names(unlist(dargs)))
  out <- do.call(rbind, lapply(dargs, `[`, all.names))
  colnames(out) <- all.names
  out
}

R > do.call(foo, x)
     A   B   C   D   E   F   G   H   I   J   L   O   R   P   T  
[1,] "b" "d" "f" "h" "j" "l" "n" "p" "r" "t" NA  NA  NA  NA  NA 
[2,] NA  NA  "c" NA  NA  "f" NA  NA  "i" NA  "l" "o" "r" NA  NA 
[3,] NA  NA  NA  "d" NA  NA  NA  "h" NA  NA  "l" NA  NA  "p" "t"

answered Sep 17 '22 11:09

GSee

Related questions
                            
                                Water ripple effect on background of website
                            
                                undefined reference to a static function
                            
                                Terminating with uncaught exception of type NSException? [closed]
                            
                                Progressive loading in ng-repeat for images, angular js
                            
                                Remove redundant parentheses from an arithmetic expression
                            
                                Android Contact Picker with only Phone Numbers
                            
                                What is the time complexity of k-means?
                            
                                Converting Google Maps styles array to Google Static Maps styles string
                            
                                Extract second subelement of every element in a list while ignoring NA's in sapply in R
                            
                                Modal Window Issue (Unknown Provider: ModalInstanceProvider)
                            
                                SQL Server - Case Statement
                            
                                How to call super method when overriding a method through a trait

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

do.call(rbind, list) for uneven number of column

Tags:

h.l.m

People also ask

2 Answers

The `rbind.fill` way

A more straightforward way (and efficient for this scenario at least):

Data:

Functions:

Update (added GSee's function as well):

Benchmarking:

Arun

GSee

Recent Activity

Donate For Us

do.call(rbind, list) for uneven number of column

Tags:

h.l.m

People also ask

2 Answers

The rbind.fill way

A more straightforward way (and efficient for this scenario at least):

Data:

Functions:

Update (added GSee's function as well):

Benchmarking:

Arun

GSee

Related questions

Recent Activity

Donate For Us

The `rbind.fill` way