I gather data from 4 df's and would like to merge them by rownames. I am looking for an efficient way to do this. This is a simplified version of the data I have. <pre class="prettyprint"><code>df1 <- data.frame(N= sample(seq(9, 27, 0.5), 40, replace= T), P= sample(seq(0.3, 4, 0.1), 40, replace= T), C= sample(seq(400, 500, 1), 40, replace= T)) df2 <- data.frame(origin= sample(c("A", "B", "C", "D", "E"), 40, replace= T), foo1= sample(c(T, F), 40, replace= T), X= sample(seq(145600, 148300, 100), 40, replace= T), Y= sample(seq(349800, 398600, 100), 40, replace= T)) df3 <- matrix(sample(seq(0, 1, 0.01), 40), 40, 100) df4 <- matrix(sample(seq(0, 1, 0.01), 40), 40, 100) rownames(df1) <- paste("P", sprintf("%02d", c(1:40)), sep= "") rownames(df2) <- rownames(df1) rownames(df3) <- rownames(df1) rownames(df4) <- rownames(df1) </code></pre> This is what I would normally do: <pre class="prettyprint"><code># merge df1 and df2 dat <- merge(df1, df2, by= "row.names", all.x= F, all.y= F) #merge rownames(dat) <- dat$Row.names #reset rownames dat$Row.names <- NULL #remove added rownames col # merge dat and df3 dat <- merge(dat, df3, by= "row.names", all.x= F, all.y= F) #merge rownames(dat) <- dat$Row.names #reset rownames dat$Row.names <- NULL #remove added rownames col # merge dat and df4 dat <- merge(dat, df4, by= "row.names", all.x= F, all.y= F) #merge rownames(dat) <- dat$Row.names #reset rownames dat$Row.names <- NULL #remove added rownames col </code></pre> As you can see, this requires a lot of code. My question is if the same result can be achieved with more simple means. I've tried (without success): UPDATE: this works now! <pre class="prettyprint"><code>MyMerge <- function(x, y){ df <- merge(x, y, by= "row.names", all.x= F, all.y= F) rownames(df) <- df$Row.names df$Row.names <- NULL return(df) } dat <- Reduce(MyMerge, list(df1, df2, df3, df4)) </code></pre> Thanks in advance for any suggestions

<code>join_all</code> from <code>plyr</code> will probably do what you want. But they all must be data frames and the rownames are added as a column <pre class="prettyprint"><code>require(plyr) df3 <- data.frame(df3) df4 <- data.frame(df4) df1$rn <- rownames(df1) df2$rn <- rownames(df2) df3$rn <- rownames(df3) df4$rn <- rownames(df4) df <- join_all(list(df1,df2,df3,df4), by = 'rn', type = 'full') </code></pre> <code>type</code> argument should help even if the rownames vary and do not match If you do not want the rownames: <pre class="prettyprint"><code>df$rn <- NULL </code></pre>

Merging more than 2 dataframes in R by rownames

Tags:

merge

dataframe

r

reduce

rowname

I gather data from 4 df's and would like to merge them by rownames. I am looking for an efficient way to do this. This is a simplified version of the data I have.

df1           <- data.frame(N= sample(seq(9, 27, 0.5), 40, replace= T),                             P= sample(seq(0.3, 4, 0.1), 40, replace= T),                             C= sample(seq(400, 500, 1), 40, replace= T)) df2           <- data.frame(origin= sample(c("A", "B", "C", "D", "E"), 40,                                            replace= T),                             foo1= sample(c(T, F), 40, replace= T),                             X= sample(seq(145600, 148300, 100), 40, replace= T),                             Y= sample(seq(349800, 398600, 100), 40, replace= T)) df3           <- matrix(sample(seq(0, 1, 0.01), 40), 40, 100) df4           <- matrix(sample(seq(0, 1, 0.01), 40), 40, 100) rownames(df1) <- paste("P", sprintf("%02d", c(1:40)), sep= "") rownames(df2) <- rownames(df1) rownames(df3) <- rownames(df1) rownames(df4) <- rownames(df1)

This is what I would normally do:

# merge df1 and df2 dat           <- merge(df1, df2, by= "row.names", all.x= F, all.y= F) #merge rownames(dat) <- dat$Row.names #reset rownames dat$Row.names <- NULL  #remove added rownames col  # merge dat and df3 dat           <- merge(dat, df3, by= "row.names", all.x= F, all.y= F) #merge rownames(dat) <- dat$Row.names #reset rownames dat$Row.names <- NULL  #remove added rownames col  # merge dat and df4 dat           <- merge(dat, df4, by= "row.names", all.x= F, all.y= F) #merge rownames(dat) <- dat$Row.names #reset rownames dat$Row.names <- NULL #remove added rownames col

As you can see, this requires a lot of code. My question is if the same result can be achieved with more simple means. I've tried (without success): UPDATE: this works now!

MyMerge       <- function(x, y){   df            <- merge(x, y, by= "row.names", all.x= F, all.y= F)   rownames(df)  <- df$Row.names   df$Row.names  <- NULL   return(df) } dat           <- Reduce(MyMerge, list(df1, df2, df3, df4))

Thanks in advance for any suggestions

378

asked May 21 '13 09:05

Hans Roelofsen

1 Answers

join_all from plyr will probably do what you want. But they all must be data frames and the rownames are added as a column

require(plyr)  df3 <- data.frame(df3) df4 <- data.frame(df4)  df1$rn <- rownames(df1) df2$rn <- rownames(df2) df3$rn <- rownames(df3) df4$rn <- rownames(df4)  df <- join_all(list(df1,df2,df3,df4), by = 'rn', type = 'full')

type argument should help even if the rownames vary and do not match If you do not want the rownames:

df$rn <- NULL

162

answered Sep 19 '22 12:09

Anto

Related questions
                            
                                grepl: Search within a string that does not contain a pattern
                            
                                Calculate group mean, sum, or other summary stats. and assign column to original data
                            
                                How to write a "reader-friendly" sessionInfo() to text file
                            
                                How to specify lib directory when installing development version R Packages from github repository
                            
                                NAMESPACE not generated by roxygen2. Skipped. - Confusion with Hadley book
                            
                                Reverse stacked bar order
                            
                                Unnest a list column directly into several columns
                            
                                Create new column based on 4 values in another column
                            
                                Getting a row from a data frame as a vector in R
                            
                                use multiple columns as variables with sapply
                            
                                Convert dataframe column to 1 or 0 for "true"/"false" values and assign to dataframe
                            
                                Plot normal, left and right skewed distribution in R
                            
                                Choosing eps and minpts for DBSCAN (R)?
                            
                                Comparing R to Matlab for Data Mining
                            
                                Converting nested list to dataframe
                            
                                What is a neat command line equivalent to RStudio's Knit HTML?
                            
                                How do I create a list of vectors in Rcpp?
                            
                                Calculating weighted mean and standard deviation
                            
                                Combine a list of matrices to a single matrix by rows
                            
                                How to optimize for integer parameters (and other discontinuous parameter space) in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With