Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

join data frames and replace one column with another

Tags:

r

dplyr

I have two data frames, one with all my data, and another with a corrected ID number for some of the data.

When I attempt to join these values with either a left, inner or full join, I end up with two ID columns (ID.x and ID.y). Is there anyway to do this so that the ID in my corrected dataset simply replaces the ID in the full dataset?

E.g.,

df.full <- left_join(df.full, correctID, 
                     by = c("value"))

I've tried both copy = TRUE and FALSE, but this doesn't seem to help.

like image 508
tnt Avatar asked Feb 05 '19 15:02

tnt


1 Answers

There are a few cases:

If you always want the value from correctID, just drop the ID column from df.full first:

df.full %>%
  select(-ID) %>%
  left_join(correctID, by = "value")

If correctID isn't complete, and you only want to use it when present:

df.full %>%
  left_join(correctID, by = "value") %>%
  mutate(ID = coalesce(ID.y, ID.x)) %>%
  select(-ID.y, -ID.x)

You can, of course, reverse that in the opposite case (only want to use correctID when df.full$ID is missing).

like image 76
Gregor Thomas Avatar answered Oct 01 '22 09:10

Gregor Thomas