I'd like to merge two data frames by <code>id</code>, but they both have 2 of the same columns; therefore, when I merge i get new <code>.x</code> and <code>.y</code> columns. How can I merge these two data frames with <code>left_join()</code> and remove the extra columns currently in my code that are the same (`element.x, day.x, element.y, and day.y) and keep a single column. Code: <pre class="prettyprint"><code># Sample data df1 <- data.frame(id = seq(1,5), value1 = rnorm(5), element = "TEST1", day = 15) df2 <- data.frame(id = seq(1,5), value2 = rnorm(5), element = "TEST1", day = 15) # Merge df <- left_join(df1, df2, by = "id") # Output id value1 element.x day.x value2 element.y day.y 1 1 -0.69700149 TEST1 15 1.4324220 TEST1 15 2 2 -0.25514949 TEST1 15 0.7281354 TEST1 15 3 3 0.09206902 TEST1 15 0.8148839 TEST1 15 4 4 2.51799237 TEST1 15 1.3919671 TEST1 15 5 5 -0.77049050 TEST1 15 -0.2707201 TEST1 15 </code></pre>

Just drop everything you don't want from <code>df2</code> - in this case the <code>id</code> and <code>value2</code> columns: <pre class="prettyprint"><code>left_join(df1, select(df2, c(id,value2)), by = "id") # id value1 element day value2 #1 1 1.2276303 TEST1 15 -0.1389861 #2 2 -0.8017795 TEST1 15 -0.5973131 #3 3 -1.0803926 TEST1 15 -2.1839668 #4 4 -0.1575344 TEST1 15 0.2408173 #5 5 -1.0717600 TEST1 15 -0.2593554 </code></pre> Beware that not all these answers are equivalent, and ask what it is you need as a result. E.g.: <pre class="prettyprint"><code>df1 <- data.frame(id=1:3,day=2:4,element=3:5,value1=100:102) df2 <- data.frame(id=1:3,day=3:5,element=4:6,value2=200:202) df1 # id day element value1 #1 1 2 3 100 #2 2 3 4 101 #3 3 4 5 102 df2 # id day element value2 #1 1 3 4 200 #2 2 4 5 201 #3 3 5 6 202 left_join(df1, df2) #Joining by: c("id", "day", "element") # id day element value1 value2 #1 1 2 3 100 NA #2 2 3 4 101 NA #3 3 4 5 102 NA left_join(df1, select(df2, c(id,value2)), by = "id") # id day element value1 value2 #1 1 2 3 100 200 #2 2 3 4 101 201 #3 3 4 5 102 202 </code></pre>

Remove same columns from left_join

Tags:

r

dplyr

I'd like to merge two data frames by id, but they both have 2 of the same columns; therefore, when I merge i get new .x and .y columns. How can I merge these two data frames with left_join() and remove the extra columns currently in my code that are the same (`element.x, day.x, element.y, and day.y) and keep a single column.

Code:

# Sample data
df1 <- data.frame(id = seq(1,5), value1 = rnorm(5), element = "TEST1", day = 15) 
df2 <- data.frame(id = seq(1,5), value2 = rnorm(5), element = "TEST1", day = 15) 

# Merge
df <- left_join(df1, df2, by = "id")

# Output
  id      value1 element.x day.x     value2 element.y day.y
1  1 -0.69700149     TEST1    15  1.4324220     TEST1    15
2  2 -0.25514949     TEST1    15  0.7281354     TEST1    15
3  3  0.09206902     TEST1    15  0.8148839     TEST1    15
4  4  2.51799237     TEST1    15  1.3919671     TEST1    15
5  5 -0.77049050     TEST1    15 -0.2707201     TEST1    15

583

asked Nov 24 '15 07:11

Vedda

1 Answers

Just drop everything you don't want from df2 - in this case the id and value2 columns:

left_join(df1, select(df2, c(id,value2)), by = "id")

#  id     value1 element day     value2
#1  1  1.2276303   TEST1  15 -0.1389861
#2  2 -0.8017795   TEST1  15 -0.5973131
#3  3 -1.0803926   TEST1  15 -2.1839668
#4  4 -0.1575344   TEST1  15  0.2408173
#5  5 -1.0717600   TEST1  15 -0.2593554

Beware that not all these answers are equivalent, and ask what it is you need as a result. E.g.:

df1 <- data.frame(id=1:3,day=2:4,element=3:5,value1=100:102)
df2 <- data.frame(id=1:3,day=3:5,element=4:6,value2=200:202)
df1

#  id day element value1
#1  1   2       3    100
#2  2   3       4    101
#3  3   4       5    102

df2
#  id day element value2
#1  1   3       4    200
#2  2   4       5    201
#3  3   5       6    202

left_join(df1, df2)
#Joining by: c("id", "day", "element")
#  id day element value1 value2
#1  1   2       3    100     NA
#2  2   3       4    101     NA
#3  3   4       5    102     NA

left_join(df1, select(df2, c(id,value2)), by = "id")
#  id day element value1 value2
#1  1   2       3    100    200
#2  2   3       4    101    201
#3  3   4       5    102    202

128

answered Sep 29 '22 21:09

thelatemail

Related questions
                            
                                Use of ddply + mutate with a custom function?
                            
                                Undesirable result when cutting by year in R
                            
                                R circlize - plot margins and plotting regions
                            
                                How to get items for both LHS and RHS for only specific columns in arules?
                            
                                Cleaning up the global environment after sourcing: How to remove objects of a certain type in R
                            
                                Plotting by group in data.table
                            
                                How to plot a ROC curve using ROCR package in r, *with only a classification contingency table*
                            
                                what is the difference between **optimize** and **uniroot**?
                            
                                R: How to shorten data frame values to first character
                            
                                Bar plot for count data by group in R
                            
                                How can I source an R file from the parent directory via the shell?
                            
                                convert .data file to .csv
                            
                                What's my user agent when I parse website with rvest package in R?
                            
                                How do I create daily time series starting from a specific date
                            
                                R: deep copy a function argument
                            
                                Maps with facet_wrap in ggplot2
                            
                                Replacing the specific values in columns of data frame using gsub in R
                            
                                Name of the Minimum value in a named vector
                            
                                Apply strptime function to every member of a data.table
                            
                                Get optimal threshold with at least 75% sensitivity with pROC in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With