Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an R dplyr method for merge with all=TRUE?

Tags:

dataframe

r

dplyr

I have two R dataframes I want to merge. In straight R you can do:

cost <- data.frame(farm=c('farm A', 'office'), cost=c(10, 100)) trees <- data.frame(farm=c('farm A', 'farm B'), trees=c(20,30)) merge(cost, trees, all=TRUE) 

which produces:

    farm cost trees 1 farm A   10    20 2 office  100    NA 3 farm B   NA    30 

I am using dplyr, and would prefer a solution such as:

left_join(cost, trees) 

which produces something close to what I want:

    farm cost trees 1 farm A   10    20 2 office  100    NA 

In dplyr I can see left_join, inner_join, semi_join and anti-join, but none of these does what merge with all=TRUE does.

Also - is there a quick way to set the NAs to 0? My efforts so far using x$trees[is.na(x$trees)] <- 0; are laborious (I need a command per column) and don't always seem to work.

thanks

like image 310
Racing Tadpole Avatar asked Feb 17 '14 23:02

Racing Tadpole


People also ask

How do I combine data in dplyr?

The beauty of dplyr is that it handles four types of joins similar to SQL: left_join() – To merge two datasets and keep all observations from the origin table. right_join() – To merge two datasets and keep all observations from the destination table. inner_join() – To merge two datasets and exclude all unmatched rows.

What is the difference between merge and join in R?

The join() functions from dplyr preserve the original order of rows in the data frames while the merge() function automatically sorts the rows alphabetically based on the column you used to perform the join.

What is a full join in R?

Full join: The full outer join returns all of the records in a new table, whether it matches on either the left or right tables. If the table rows match, then a join will be executed, otherwise it will return NULL in places where a matching row does not exist.

How do I combine data in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.


1 Answers

The most recent version of dplyr (0.4.0) now has a full_join option, which is what I believe you want.

cost <- data.frame(farm=c('farm A', 'office'), cost=c(10, 100)) trees <- data.frame(farm=c('farm A', 'farm B'), trees=c(20,30)) merge(cost, trees, all=TRUE) 

Returns

> merge(cost, trees, all=TRUE)         farm cost trees     1 farm A   10    20     2 office  100    NA     3 farm B   NA    30 

And

library(dplyr) full_join(cost, trees) 

Returns

> full_join(cost, trees) Joining by: "farm"     farm cost trees 1 farm A   10    20 2 office  100    NA 3 farm B   NA    30 Warning message: joining factors with different levels, coercing to character vector 
like image 180
Avraham Avatar answered Oct 11 '22 17:10

Avraham