Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove same columns from left_join

Tags:

r

dplyr

I'd like to merge two data frames by id, but they both have 2 of the same columns; therefore, when I merge i get new .x and .y columns. How can I merge these two data frames with left_join() and remove the extra columns currently in my code that are the same (`element.x, day.x, element.y, and day.y) and keep a single column.

Code:

# Sample data
df1 <- data.frame(id = seq(1,5), value1 = rnorm(5), element = "TEST1", day = 15) 
df2 <- data.frame(id = seq(1,5), value2 = rnorm(5), element = "TEST1", day = 15) 

# Merge
df <- left_join(df1, df2, by = "id")

# Output
  id      value1 element.x day.x     value2 element.y day.y
1  1 -0.69700149     TEST1    15  1.4324220     TEST1    15
2  2 -0.25514949     TEST1    15  0.7281354     TEST1    15
3  3  0.09206902     TEST1    15  0.8148839     TEST1    15
4  4  2.51799237     TEST1    15  1.3919671     TEST1    15
5  5 -0.77049050     TEST1    15 -0.2707201     TEST1    15
like image 583
Vedda Avatar asked Nov 24 '15 07:11

Vedda


People also ask

How do I remove duplicate columns in R?

The easiest way to remove repeated column names from a data frame is by using the duplicated() function. This function (together with the colnames() function) indicates for each column name if it appears more than once. Using this information and square brackets one can easily remove the duplicate column names.

How remove duplicate columns in join?

Method 1: Using drop() function We can join the dataframes using joins like inner join and after this join, we can use the drop method to remove one duplicate column.

How do I remove duplicate columns in PySpark join?

Removing duplicate columns after join in PySpark If we want to drop the duplicate column, then we have to specify the duplicate column in the join function. Here we are simply using join to join two dataframes and then drop duplicate columns.

How do you remove columns from a data set?

To remove a single column, select the column you want to remove, and then select Home > Remove Columns > Remove Columns.


1 Answers

Just drop everything you don't want from df2 - in this case the id and value2 columns:

left_join(df1, select(df2, c(id,value2)), by = "id")

#  id     value1 element day     value2
#1  1  1.2276303   TEST1  15 -0.1389861
#2  2 -0.8017795   TEST1  15 -0.5973131
#3  3 -1.0803926   TEST1  15 -2.1839668
#4  4 -0.1575344   TEST1  15  0.2408173
#5  5 -1.0717600   TEST1  15 -0.2593554

Beware that not all these answers are equivalent, and ask what it is you need as a result. E.g.:

df1 <- data.frame(id=1:3,day=2:4,element=3:5,value1=100:102)
df2 <- data.frame(id=1:3,day=3:5,element=4:6,value2=200:202)
df1

#  id day element value1
#1  1   2       3    100
#2  2   3       4    101
#3  3   4       5    102

df2
#  id day element value2
#1  1   3       4    200
#2  2   4       5    201
#3  3   5       6    202

left_join(df1, df2)
#Joining by: c("id", "day", "element")
#  id day element value1 value2
#1  1   2       3    100     NA
#2  2   3       4    101     NA
#3  3   4       5    102     NA

left_join(df1, select(df2, c(id,value2)), by = "id")
#  id day element value1 value2
#1  1   2       3    100    200
#2  2   3       4    101    201
#3  3   4       5    102    202
like image 128
thelatemail Avatar answered Sep 29 '22 21:09

thelatemail