Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rbind two data.frame preserving row order and row names

I have a list of data.frame objects which i would like to row append to one another, ie merge(..., all=T). However, merge seems to remove the row names which I need to be kept intact. Any ideas? Example:

x = data.frame(a=1:2, b=2:3, c=3:4, d=4:5, row.names=c("row_1", "another_row1"))
y = data.frame(a=c(10,20), b=c(20,30), c=c(30,40), row.names=c("row_2", "another_row2"))
> merge(x, y, all=T, sort=F)
     a  b  c  d
  1  1  2  3  4
  2  2  3  4  5
  3 10 20 30 NA
  4 20 30 40 NA
like image 988
Alex Avatar asked Feb 10 '13 15:02

Alex


People also ask

How do you Rbind two data frames?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do I combine two data frames with different number of rows?

Use the full_join Function to Merge Two R Data Frames With Different Number of Rows. full_join is part of the dplyr package, and it can be used to merge two data frames with a different number of rows.

How do I merge two rows from a Dataframe in R?

To combine two data frames in R, use the merge() function. The merge() is a built-in R function that merges two data frames by common columns or row names.

Do columns need to be in same order for Rbind?

Let's find out. In the following example, we will change the column name from 'lastName' to 'surName' for the second data frame. The above code throws an error that the column names must match. So, the column names in both the data frames must be the same if you want to use rbind().


2 Answers

Since you know you are not actually merging, but just rbind-ing, maybe something like this will work. It makes use of rbind.fill from "plyr". To use it, specify a list of the data.frames you want to rbind.

RBIND <- function(datalist) {
  require(plyr)
  temp <- rbind.fill(datalist)
  rownames(temp) <- unlist(lapply(datalist, row.names))
  temp
}
RBIND(list(x, y))
#               a  b  c  d
# row_1         1  2  3  4
# another_row1  2  3  4  5
# row_2        10 20 30 NA
# another_row2 20 30 40 NA
like image 182
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 12 '22 22:10

A5C1D2H2I1M1N2O1R2T1


One way is to use row.names in merge so that you get it as an additional column.

> merge(x, y, by=c("row.names", "a","b","c"), all.x=T, all.y=T, sort=F)

#      Row.names  a  b  c  d
# 1        row_1  1  2  3  4
# 2 another_row1  2  3  4  5
# 3        row_2 10 20 30 NA
# 4 another_row2 20 30 40 NA

Edit: By looking at the merge function with getS3method('merge', 'data.frame'), the row.names are clearly set to NULL (it is a pretty long code, so I won't paste here).

# Commenting 
# Lines 63 and 64
row.names(x) <- NULL
row.names(y) <- NULL

# and 
# Line 141 (thanks Ananda for pointing out)
attr(res, "row.names") <- .set_row_names(nrow(res))

and creating a new function, say, MERGE, works as the OP intends for this example. Just an experimentation.

like image 26
Arun Avatar answered Oct 12 '22 21:10

Arun