I want to merge two data frames keeping the original row order of one of them (df.2
in the example below).
Here are some sample data (all values from class
column are defined in both data frames):
df.1 <- data.frame(class = c(1, 2, 3), prob = c(0.5, 0.7, 0.3)) df.2 <- data.frame(object = c('A', 'B', 'D', 'F', 'C'), class = c(2, 1, 2, 3, 1))
If I do:
merge(df.2, df.1)
Output is:
class object prob 1 1 B 0.5 2 1 C 0.5 3 2 A 0.7 4 2 D 0.7 5 3 F 0.3
If I add sort = FALSE
:
merge(df.2, df.1, sort = F)
Result is:
class object prob 1 2 A 0.7 2 2 D 0.7 3 1 B 0.5 4 1 C 0.5 5 3 F 0.3
But what I would like is:
class object prob 1 2 A 0.7 2 1 B 0.5 3 2 D 0.7 4 3 F 0.3 5 1 C 0.5
Use the full_join Function to Merge Two R Data Frames With Different Number of Rows. full_join is part of the dplyr package, and it can be used to merge two data frames with a different number of rows.
Key PointsPandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.
To merge two data frames (datasets) horizontally, use the merge() function in the R language. To bind or combine rows in R, use the rbind() function. The rbind() stands for row binding.
The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.
You just need to create a variable which gives the row number in df.2. Then, once you have merged your data, you sort the new data set according to this variable. Here is an example :
df.1<-data.frame(class=c(1,2,3), prob=c(0.5,0.7,0.3)) df.2<-data.frame(object=c('A','B','D','F','C'), class=c(2,1,2,3,1)) df.2$id <- 1:nrow(df.2) out <- merge(df.2,df.1, by = "class") out[order(out$id), ]
Check out the join function in the plyr package. It's like merge, but it allows you to keep the row order of one of the data sets. Overall, it's more flexible than merge.
Using your example data, we would use join
like this:
> join(df.2,df.1) Joining by: class object class prob 1 A 2 0.7 2 B 1 0.5 3 D 2 0.7 4 F 3 0.3 5 C 1 0.5
Here are a couple of links describing fixes to the merge function for keeping the row order:
http://www.r-statistics.com/2012/01/merging-two-data-frame-objects-while-preserving-the-rows-order/
http://r.789695.n4.nabble.com/patching-merge-to-allow-the-user-to-keep-the-order-of-one-of-the-two-data-frame-objects-merged-td4296561.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With