I am trying to LEFT Join 2 data frames but I do not want join all the variables from the second data set:
As an example, I have dataset 1 (DF1):
Cl Q Sales Date A 2 30 01/01/2014 A 3 24 02/01/2014 A 1 10 03/01/2014 B 4 10 01/01/2014 B 1 20 02/01/2014 B 3 30 03/01/2014
And I would like to left join dataset 2 (DF2):
Client LO CON A 12 CA B 11 US C 12 UK D 10 CA E 15 AUS F 91 DD
I am able to left join with the following code:
merge(x = DF1, y = DF2, by = "Client", all.x=TRUE) :
Client Q Sales Date LO CON A 2 30 01/01/2014 12 CA A 3 24 02/01/2014 12 CA A 1 10 03/01/2014 12 CA B 4 10 01/01/2014 11 US B 1 20 02/01/2014 11 US B 3 30 03/01/2014 11 US
However, it merges both column LO and CON. I would only like to merge the column LO.
Client Q Sales Date LO A 2 30 01/01/2014 12 A 3 24 02/01/2014 12 A 1 10 03/01/2014 12 B 4 10 01/01/2014 11 B 1 20 02/01/2014 11 B 3 30 03/01/2014 11
The merge() function in R combines two data frames. The most crucial requirement for connecting two data frames is that the column type is the same on which the merging occurs. The merge() function is similar to the join function in a Relational Database Management System (RDMS).
The join() functions from dplyr preserve the original order of rows in the data frames while the merge() function automatically sorts the rows alphabetically based on the column you used to perform the join.
To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.
You can do this by subsetting the data you pass into your merge:
merge(x = DF1, y = DF2[ , c("Client", "LO")], by = "Client", all.x=TRUE)
Or you can simply delete the column after your current merge :)
I think it's a little simpler to use the dplyr
functions select
and left_join
; at least it's easier for me to understand. The join function from dplyr
are made to mimic sql arguments.
library(tidyverse) DF2 <- DF2 %>% select(client, LO) joined_data <- left_join(DF1, DF2, by = "Client")
You don't actually need to use the "by" argument in this case because the columns have the same name.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With