I have two data frames::
as1 <- data.frame(ID = c(1,2,3,4,5,6),
pID = c(21,22,23,24,25,26),
Values = c(435,33,45,NA, NA,12))
as2 <- data.frame(ID = c(4,5),
pid = c(24,25),
Values = c(544, 676))
I need to replace the NA values in as1 with those in as2 by matching ID and pID
I need to get the result data frame as:
resultdf
ID pID Values
1 1 21 435
2 2 22 33
3 3 23 45
4 4 24 544
5 5 25 676
6 6 26 12
I tried doing subset and then na.omit() and then rbinding... but I'm losing the index.
Here are two base R solutions.
First, using match in "ID" to select the elements of "Value" in as1 to fill in:
as1$Values[match(as2$ID, as1$ID)] <- as2$Values
as1
ID pID Values
1 1 21 435
2 2 22 33
3 3 23 45
4 4 24 544
5 5 25 676
6 6 26 12
This only works if ID is the true ID for both data sets (that is, pid is "irrelevant"). Second, in the case that pid is also needed, you could use merge and then "collapse" the two values columns as follows:
df <- merge(as1, as2, by.x=c("ID", "pID"), by.y=c("ID", "pid"), all=TRUE)
This produces a four column data frame with two values columns. Collapse these into a single column with ifelse:
df <- cbind(df[c(1,2)], "Values"=with(df, ifelse(is.na(Values.y), Values.x, Values.y)))
df
ID pID Values
1 1 21 435
2 2 22 33
3 3 23 45
4 4 24 544
5 5 25 676
6 6 26 12
Or another option is data.table
library(data.table)
setDT(as1)[as2, Values := i.Values , on =.(ID, pid)]
as1
# ID pid Values
#1: 1 21 435
#2: 2 22 33
#3: 3 23 45
#4: 4 24 544
#5: 5 25 676
#6: 6 26 12
Try with dplyr library. Firstly join both tables and then create new column with values instead NA's:
library("dplyr")
as1 <- data.frame(ID = c(1,2,3,4,5,6), pid = c(21,22,23,24,25,26),Values = c(435,33,45,NA, NA,12))
as2 <- data.frame(ID = c(4,5),pid = c(24,25), Values = c(544, 676))
left_join(as1, as2, by = c("ID", "pid")) %>%
mutate(Values = ifelse(is.na(Values.x), Values.y, Values.x)) %>%
select(ID, pid, Values)
# ID pid Values
# 1 1 21 435
# 2 2 22 33
# 3 3 23 45
# 4 4 24 544
# 5 5 25 676
# 6 6 26 12
Following Marta's dplyr solution, we can use coalesce instead to combine the merged Value.x and Value.y:
library(dplyr)
res <- as1 %>% left_join(as2, by = c("ID", "pID"="pid")) %>%
mutate(Values=coalesce(Values.x,Values.y)) %>%
select(-Values.x,-Values.y)
## ID pID Values
##1 1 21 435
##2 2 22 33
##3 3 23 45
##4 4 24 544
##5 5 25 676
##6 6 26 12
Note also the syntax used for the by parameter in left_join. Here, we join by ID and by pID for as1 and pid for as2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With