Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Left join only selected columns in R with the merge() function

Tags:

merge

r

I am trying to LEFT Join 2 data frames but I do not want join all the variables from the second data set:

As an example, I have dataset 1 (DF1):

  Cl    Q   Sales  Date    A    2   30     01/01/2014    A    3   24     02/01/2014    A    1   10     03/01/2014    B    4   10     01/01/2014    B    1   20     02/01/2014    B    3   30     03/01/2014 

And I would like to left join dataset 2 (DF2):

Client  LO  CON    A    12  CA    B    11  US    C    12  UK    D    10  CA    E    15  AUS    F    91  DD 

I am able to left join with the following code:

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE) :

   Client Q    Sales   Date             LO      CON    A      2    30      01/01/2014       12      CA    A      3    24      02/01/2014       12      CA    A      1    10      03/01/2014       12      CA    B      4    10      01/01/2014       11      US    B      1    20      02/01/2014       11      US    B      3    30      03/01/2014       11      US 

However, it merges both column LO and CON. I would only like to merge the column LO.

   Client Q    Sales   Date             LO          A      2    30      01/01/2014       12          A      3    24      02/01/2014       12          A      1    10      03/01/2014       12          B      4    10      01/01/2014       11          B      1    20      02/01/2014       11         B      3    30      03/01/2014       11       
like image 551
Samer Nachabé Avatar asked Jun 12 '14 18:06

Samer Nachabé


People also ask

What is merge () in R?

The merge() function in R combines two data frames. The most crucial requirement for connecting two data frames is that the column type is the same on which the merging occurs. The merge() function is similar to the join function in a Relational Database Management System (RDMS).

What is difference between join and merge in R?

The join() functions from dplyr preserve the original order of rows in the data frames while the merge() function automatically sorts the rows alphabetically based on the column you used to perform the join.

How do I grab certain columns in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.


2 Answers

You can do this by subsetting the data you pass into your merge:

merge(x = DF1, y = DF2[ , c("Client", "LO")], by = "Client", all.x=TRUE) 

Or you can simply delete the column after your current merge :)

like image 162
stanekam Avatar answered Oct 03 '22 14:10

stanekam


I think it's a little simpler to use the dplyr functions select and left_join ; at least it's easier for me to understand. The join function from dplyr are made to mimic sql arguments.

 library(tidyverse)   DF2 <- DF2 %>%    select(client, LO)   joined_data <- left_join(DF1, DF2, by = "Client") 

You don't actually need to use the "by" argument in this case because the columns have the same name.

like image 27
Ben G Avatar answered Oct 03 '22 13:10

Ben G