I am trying to merge a number of dataframes using rbind. If I call rbind directy there is no problem:
> test <- rbind(x)
> is.data.frame(x)
[1] TRUE
however, if I use do.call
I run into a problem where my character columns are collapsed and the dataframe is converted to a matrix.
>test <- do.call("rbind", x)
> is.data.frame(test)
[1] FALSE
As per the ?rbind documentation i tried add stringsAsFactors = FALSE
but no change in behavior. My data tables look something like this:
ID sequence descriptor
1 aaacccttt g12
2 actttgtgt e34
3 tttgggctc b12
4 ccgcgcgcg c12
… … ...
and the rbind output looks like this but the do.call("rbind", x)
output appears as follows, where the sequence column is no longer a character:
ID 363 426 91
Sequence 98 353 100
descriptor g12 b12 c12
I would like to use do.call because I am looping through a set of dataframes in order to consolidate them using a script below. Another helpful answer might offer an alternative solution on how to merge multiple dataframes while calling them in a loop.
stringsAsFactors = FALSE
dfs <- as.list(ls(pattern="Data_"))
for (i in 1:length(dfs)) {
x <- get(as.character(dfs[i]))
AllData <- do.call("rbind", x)
}
dfs
is the list of dataframes in my working environment and I get the actual dataframe using get
thank you.
There are two different issues causing you difficulties.
stringsAsFactors
You're right to be looking at stringsAsFactors
, but just haven't called it in quite the right place.
You have two options. You can either set it in your options
, like this:
options(stringsAsFactors=FALSE)
Or in the code used to create your data.table
s:
a <- read.table(textConnection("ID sequence descriptor
1 aaacccttt g12
2 actttgtgt e34
3 tttgggctc b12
4 ccgcgcgcg c12"),
header=T, stringsAsFactors=FALSE)
args=
argument to do.call()
You're also on the right track in wanting to use do.call()
for this. But, as @Sacha points out, dfs
needs to be a list of data.frame
s, not a single data.frame
(which is itself a list of vectors).
# Create list of two data.frames
b <- a
dfs <- list(a, b)
# Or, if you start with a list of their names
dfs <- list("a", "b")
dfs <- lapply(dfs, get)
# Check that this works
do.call("rbind", dfs)
# ID sequence descriptor
# 1 1 aaacccttt g12
# 2 2 actttgtgt e34
# 3 3 tttgggctc b12
# 4 4 ccgcgcgcg c12
# 5 1 aaacccttt g12
# 6 2 actttgtgt e34
# 7 3 tttgggctc b12
# 8 4 ccgcgcgcg c12
This should also work for you even if you have just a single data.frame
, as long as it is wrapped in a (length-1) list
, like this: dfs <- list(a)
Using Josh' example code. I am pretty sure that what is happening is this:
Data:
x <- read.table(textConnection("ID sequence descriptor
1 aaacccttt g12
2 actttgtgt e34
3 tttgggctc b12
4 ccgcgcgcg c12"),
header=T, stringsAsFactors=FALSE)
First this:
rbind(x)
does nothing since there is only one argument. I.e. there is nothing to append to the data frame so it just returns the same dataframe. Then:
do.call("rbind", x)
What happens here is that rbind()
is called with all the arguments in the list x
. A data frame is a list with columns as elements. Therefore, this would be the same as:
rbind(x$ID,x$sequence,x$descriptor)
so you put three vectors together by row. Hence, this becomes the transpose of what you had and since data.frames only store different types of vectors columnwise this must become a character matrix.
I think that if x
is a list of dataframes it works fine. It just shouldn't be a data frame itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With