If I have two data frames, such as:
df1 = data.frame(x=1:3,y=1:3,row.names=c('r1','r2','r3'))
df2 = data.frame(z=5:7,row.names=c('r5','r6','r7'))
(
R> df1
x y
r1 1 1
r2 2 2
r3 3 3
R> df2
z
r5 5
r6 6
r7 7
), I'd like to merge them by row names, keeping everything (so an outer join, or all=T). This does it:
merged.df <- merge(df1,df2,all=T,by='row.names')
R> merged.df
Row.names x y z
1 r1 1 1 NA
2 r2 2 2 NA
3 r3 3 3 NA
4 r5 NA NA 5
5 r6 NA NA 6
6 r7 NA NA 7
but I want the input row names to be the row names in the output dataframe (merged.df).
I can do:
rownames(merged.df) <- merged.df[[1]]
merged.df <- merged.df[-1]
which works, but seems inelegant and hard to remember. Anyone know of a cleaner way?
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
Cbind: Combine objects by columns matching the rows on row names in mbojan/mbtools: Chaotic Collection of Functions and Datasets Possibly Useful Also To Others.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
merge : The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.
Not sure if it's any easier to remember, but you can do it all in one step using transform
.
transform(merge(df1,df2,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
# x y z
#r1 1 1 NA
#r2 2 2 NA
#r3 3 3 NA
#r5 NA NA 5
#r6 NA NA 6
#r7 NA NA 7
From the help of merge
:
If the matching involved row names, an extra character column called Row.names is added at the left, and in all cases the result has ‘automatic’ row names.
So it is clear that you can't avoid the Row.names
column at least using merge
. But maybe to remove this column you can subset by name and not by index. For example:
dd <- merge(df1,df2,by=0,all=TRUE) ## by=0 easier to write than row.names ,
## TRUE is cleaner than T
Then I use row.names
to subset like this :
res <- subset(dd,select=-c(Row.names))
rownames(res) <- dd[,'Row.names']
x y z
1 1 1 NA
2 2 2 NA
3 3 3 NA
4 NA NA 5
5 NA NA 6
6 NA NA 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With