I have two data frames::
as1 <- data.frame(ID = c(1,2,3,4,5,6),
pID = c(21,22,23,24,25,26),
Values = c(435,33,45,NA, NA,12))
as2 <- data.frame(ID = c(4,5),
pid = c(24,25),
Values = c(544, 676))
I need to replace the NA values in as1 with those in as2 by matching ID and pID
I need to get the result data frame as:
resultdf
ID pID Values
1 1 21 435
2 2 22 33
3 3 23 45
4 4 24 544
5 5 25 676
6 6 26 12
I tried doing subset and then na.omit()
and then rbind
ing... but I'm losing the index.
Here are two base R solutions.
First, using match
in "ID" to select the elements of "Value" in as1 to fill in:
as1$Values[match(as2$ID, as1$ID)] <- as2$Values
as1
ID pID Values
1 1 21 435
2 2 22 33
3 3 23 45
4 4 24 544
5 5 25 676
6 6 26 12
This only works if ID is the true ID for both data sets (that is, pid is "irrelevant"). Second, in the case that pid is also needed, you could use merge
and then "collapse" the two values columns as follows:
df <- merge(as1, as2, by.x=c("ID", "pID"), by.y=c("ID", "pid"), all=TRUE)
This produces a four column data frame with two values columns. Collapse these into a single column with ifelse
:
df <- cbind(df[c(1,2)], "Values"=with(df, ifelse(is.na(Values.y), Values.x, Values.y)))
df
ID pID Values
1 1 21 435
2 2 22 33
3 3 23 45
4 4 24 544
5 5 25 676
6 6 26 12
Or another option is data.table
library(data.table)
setDT(as1)[as2, Values := i.Values , on =.(ID, pid)]
as1
# ID pid Values
#1: 1 21 435
#2: 2 22 33
#3: 3 23 45
#4: 4 24 544
#5: 5 25 676
#6: 6 26 12
Try with dplyr
library. Firstly join both tables and then create new column with values instead NA
's:
library("dplyr")
as1 <- data.frame(ID = c(1,2,3,4,5,6), pid = c(21,22,23,24,25,26),Values = c(435,33,45,NA, NA,12))
as2 <- data.frame(ID = c(4,5),pid = c(24,25), Values = c(544, 676))
left_join(as1, as2, by = c("ID", "pid")) %>%
mutate(Values = ifelse(is.na(Values.x), Values.y, Values.x)) %>%
select(ID, pid, Values)
# ID pid Values
# 1 1 21 435
# 2 2 22 33
# 3 3 23 45
# 4 4 24 544
# 5 5 25 676
# 6 6 26 12
Following Marta's dplyr
solution, we can use coalesce
instead to combine the merged Value.x
and Value.y
:
library(dplyr)
res <- as1 %>% left_join(as2, by = c("ID", "pID"="pid")) %>%
mutate(Values=coalesce(Values.x,Values.y)) %>%
select(-Values.x,-Values.y)
## ID pID Values
##1 1 21 435
##2 2 22 33
##3 3 23 45
##4 4 24 544
##5 5 25 676
##6 6 26 12
Note also the syntax used for the by
parameter in left_join
. Here, we join by ID
and by pID
for as1
and pid
for as2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With