Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two data frames while keeping the original row order

I want to merge two data frames keeping the original row order of one of them (df.2 in the example below).

Here are some sample data (all values from class column are defined in both data frames):

df.1 <- data.frame(class = c(1, 2, 3), prob = c(0.5, 0.7, 0.3)) df.2 <- data.frame(object = c('A', 'B', 'D', 'F', 'C'), class = c(2, 1, 2, 3, 1)) 

If I do:

merge(df.2, df.1) 

Output is:

  class object prob 1     1      B  0.5 2     1      C  0.5 3     2      A  0.7 4     2      D  0.7 5     3      F  0.3 

If I add sort = FALSE:

merge(df.2, df.1, sort = F)                                                         

Result is:

  class object prob 1     2      A  0.7 2     2      D  0.7 3     1      B  0.5 4     1      C  0.5 5     3      F  0.3 

But what I would like is:

  class object prob 1     2      A  0.7 2     1      B  0.5 3     2      D  0.7 4     3      F  0.3     5     1      C  0.5 
like image 582
DJack Avatar asked Jul 26 '13 09:07

DJack


People also ask

How do I combine two data frames with different number of rows?

Use the full_join Function to Merge Two R Data Frames With Different Number of Rows. full_join is part of the dplyr package, and it can be used to merge two data frames with a different number of rows.

Which function is used to merge two data frames?

Key PointsPandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.

How do I merge data frames by rows in R?

To merge two data frames (datasets) horizontally, use the merge() function in the R language. To bind or combine rows in R, use the rbind() function. The rbind() stands for row binding.

Why does merge result in more rows than original data?

The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.


2 Answers

You just need to create a variable which gives the row number in df.2. Then, once you have merged your data, you sort the new data set according to this variable. Here is an example :

df.1<-data.frame(class=c(1,2,3), prob=c(0.5,0.7,0.3)) df.2<-data.frame(object=c('A','B','D','F','C'), class=c(2,1,2,3,1)) df.2$id  <- 1:nrow(df.2) out  <- merge(df.2,df.1, by = "class") out[order(out$id), ] 
like image 100
PAC Avatar answered Oct 25 '22 04:10

PAC


Check out the join function in the plyr package. It's like merge, but it allows you to keep the row order of one of the data sets. Overall, it's more flexible than merge.

Using your example data, we would use join like this:

> join(df.2,df.1) Joining by: class   object class prob 1      A     2  0.7 2      B     1  0.5 3      D     2  0.7 4      F     3  0.3 5      C     1  0.5 

Here are a couple of links describing fixes to the merge function for keeping the row order:

http://www.r-statistics.com/2012/01/merging-two-data-frame-objects-while-preserving-the-rows-order/

http://r.789695.n4.nabble.com/patching-merge-to-allow-the-user-to-keep-the-order-of-one-of-the-two-data-frame-objects-merged-td4296561.html

like image 26
user2635373 Avatar answered Oct 25 '22 03:10

user2635373