Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one merge dataframes by row name without adding a "Row.names" column?

Tags:

merge

dataframe

r

If I have two data frames, such as:

df1 = data.frame(x=1:3,y=1:3,row.names=c('r1','r2','r3'))
df2 = data.frame(z=5:7,row.names=c('r5','r6','r7'))

(

R> df1
   x y
r1 1 1
r2 2 2
r3 3 3

R> df2
   z
r5 5
r6 6
r7 7

), I'd like to merge them by row names, keeping everything (so an outer join, or all=T). This does it:

merged.df <- merge(df1,df2,all=T,by='row.names')
R> merged.df
  Row.names  x  y  z
1        r1  1  1 NA
2        r2  2  2 NA
3        r3  3  3 NA
4        r5 NA NA  5
5        r6 NA NA  6
6        r7 NA NA  7

but I want the input row names to be the row names in the output dataframe (merged.df).

I can do:

rownames(merged.df) <- merged.df[[1]]
merged.df <- merged.df[-1]

which works, but seems inelegant and hard to remember. Anyone know of a cleaner way?

like image 796
user116293 Avatar asked Jun 29 '13 01:06

user116293


People also ask

How do I merge two Dataframes by row names in R?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

Does Cbind match row names?

Cbind: Combine objects by columns matching the rows on row names in mbojan/mbtools: Chaotic Collection of Functions and Datasets Possibly Useful Also To Others.

How do I combine two Dataframes with different rows and columns?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

Why does merge Increase Number of rows?

merge : The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.


2 Answers

Not sure if it's any easier to remember, but you can do it all in one step using transform.

transform(merge(df1,df2,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
#    x  y  z
#r1  1  1 NA
#r2  2  2 NA
#r3  3  3 NA
#r5 NA NA  5
#r6 NA NA  6
#r7 NA NA  7
like image 166
thelatemail Avatar answered Oct 25 '22 00:10

thelatemail


From the help of merge:

If the matching involved row names, an extra character column called Row.names is added at the left, and in all cases the result has ‘automatic’ row names.

So it is clear that you can't avoid the Row.names column at least using merge. But maybe to remove this column you can subset by name and not by index. For example:

dd <- merge(df1,df2,by=0,all=TRUE) ## by=0 easier to write than row.names , 
                                   ## TRUE is cleaner than T

Then I use row.names to subset like this :

res <- subset(dd,select=-c(Row.names))
rownames(res) <- dd[,'Row.names']
  x  y  z
1  1  1 NA
2  2  2 NA
3  3  3 NA
4 NA NA  5
5 NA NA  6
6 NA NA  7
like image 33
agstudy Avatar answered Oct 24 '22 23:10

agstudy