Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

losing dataframe when using do.call

Tags:

dataframe

r

I am trying to merge a number of dataframes using rbind. If I call rbind directy there is no problem:

> test <- rbind(x)
> is.data.frame(x)
[1] TRUE

however, if I use do.call I run into a problem where my character columns are collapsed and the dataframe is converted to a matrix.

>test <- do.call("rbind", x)
> is.data.frame(test)
[1] FALSE

As per the ?rbind documentation i tried add stringsAsFactors = FALSE but no change in behavior. My data tables look something like this:

ID  sequence    descriptor
1   aaacccttt   g12
2   actttgtgt   e34
3   tttgggctc   b12
4   ccgcgcgcg   c12
…   …       ...

and the rbind output looks like this but the do.call("rbind", x) output appears as follows, where the sequence column is no longer a character:

ID  363 426 91
Sequence 98 353 100
descriptor  g12 b12 c12 

I would like to use do.call because I am looping through a set of dataframes in order to consolidate them using a script below. Another helpful answer might offer an alternative solution on how to merge multiple dataframes while calling them in a loop.

stringsAsFactors = FALSE
dfs <- as.list(ls(pattern="Data_"))
for (i in 1:length(dfs)) {
  x <- get(as.character(dfs[i]))
  AllData <- do.call("rbind", x) 
  }

dfs is the list of dataframes in my working environment and I get the actual dataframe using get

thank you.

like image 513
zach Avatar asked Nov 01 '11 22:11

zach


2 Answers

There are two different issues causing you difficulties.

  • stringsAsFactors

You're right to be looking at stringsAsFactors, but just haven't called it in quite the right place.

You have two options. You can either set it in your options, like this:

options(stringsAsFactors=FALSE)

Or in the code used to create your data.tables:

a <- read.table(textConnection("ID  sequence    descriptor
1   aaacccttt   g12
2   actttgtgt   e34
3   tttgggctc   b12
4   ccgcgcgcg   c12"),
header=T, stringsAsFactors=FALSE)
  • args= argument to do.call()

You're also on the right track in wanting to use do.call() for this. But, as @Sacha points out, dfs needs to be a list of data.frames, not a single data.frame (which is itself a list of vectors).

# Create list of two data.frames
b <- a
dfs <- list(a, b)

# Or, if you start with a list of their names
dfs <- list("a", "b")
dfs <- lapply(dfs, get)

# Check that this works
do.call("rbind", dfs)
#   ID  sequence descriptor
# 1  1 aaacccttt        g12
# 2  2 actttgtgt        e34
# 3  3 tttgggctc        b12
# 4  4 ccgcgcgcg        c12
# 5  1 aaacccttt        g12
# 6  2 actttgtgt        e34
# 7  3 tttgggctc        b12
# 8  4 ccgcgcgcg        c12

This should also work for you even if you have just a single data.frame, as long as it is wrapped in a (length-1) list, like this: dfs <- list(a)

like image 138
Josh O'Brien Avatar answered Sep 25 '22 12:09

Josh O'Brien


Using Josh' example code. I am pretty sure that what is happening is this:

Data:
    x <- read.table(textConnection("ID  sequence    descriptor
    1   aaacccttt   g12
    2   actttgtgt   e34
    3   tttgggctc   b12
    4   ccgcgcgcg   c12"),
    header=T, stringsAsFactors=FALSE)

First this:

rbind(x)

does nothing since there is only one argument. I.e. there is nothing to append to the data frame so it just returns the same dataframe. Then:

do.call("rbind", x)

What happens here is that rbind() is called with all the arguments in the list x. A data frame is a list with columns as elements. Therefore, this would be the same as:

rbind(x$ID,x$sequence,x$descriptor)

so you put three vectors together by row. Hence, this becomes the transpose of what you had and since data.frames only store different types of vectors columnwise this must become a character matrix.

I think that if x is a list of dataframes it works fine. It just shouldn't be a data frame itself.

like image 42
Sacha Epskamp Avatar answered Sep 25 '22 12:09

Sacha Epskamp