Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditionally replace a value in a data frame with a value from a second data frame

Tags:

dataframe

r

Say I have a data frame, d1, that looks like this:

  site code trait
1    1    A   1.0
2    2    B   1.3
3    3    A    NA
4    4    B   2.9
5    5    A    NA

Here is the dput to generate d1:

structure(list(site = 1:5, code = structure(c(1L, 2L, 1L, 2L, 
1L), .Label = c("A", "B"), class = "factor"), trait = c(1, 1.3, 
NA, 2.9, NA)), .Names = c("site", "code", "trait"), row.names = c(NA, 
-5L), class = "data.frame")

I have a second data frame, d2, that looks like this:

  code trait
1    A   1.5
2    B   2.5

Here is the dput to generate d2:

structure(list(code = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    trait = c(1.5, 2.5)), .Names = c("code", "trait"), row.names = c(NA, 
-2L), class = "data.frame")

I would like a piece of code that replaces the NA values of trait with the trait value from d2 that matches the code character for a particular row in d1. The final output of d1 would look like this:

  site code trait
1    1    A   1.0
2    2    B   1.3
3    3    A   1.5
4    4    B   2.9
5    5    A   1.5

Things I've tried:

d1$trait<- ifelse(is.na(d1$trait),d2$trait[d2$code == d1$code],d1$trait)

When using this code I'm getting a warning:

Warning messages: 1: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length 2: In ==.default(d2$code, d1$code) : longer object length is not a multiple of shorter object length

like image 601
colin Avatar asked Apr 28 '26 09:04

colin


2 Answers

Your ifelse syntax is close, but the problematic bit is:

d2$trait[d2$code == d1$code]

Here, you are trying to look up the d2$trait value corresponding to the correct code value from d1, but you are actually just comparing the corresponding elements of d2$code to d1$code. The operation can instead be accomplished with match:

d1$trait<- ifelse(is.na(d1$trait),d2$trait[match(d1$code, d2$code)], d1$trait)
d1
#   site code trait
# 1    1    A   1.0
# 2    2    B   1.3
# 3    3    A   1.5
# 4    4    B   2.9
# 5    5    A   1.5

An alternative would be to just replace the missing values, again using match to grab the relevant elements from d2$trait:

d1$trait[is.na(d1$trait)] <- d2$trait[match(d1$code[is.na(d1$trait)], d2$code)]
d1
#   site code trait
# 1    1    A   1.0
# 2    2    B   1.3
# 3    3    A   1.5
# 4    4    B   2.9
# 5    5    A   1.5

While match and merge are internally doing very similar things, I find the match syntax to be a bit easier to use because you don't need to create an intermediate object via merge and then grab the relevant information from that intermediate object.

like image 80
josliber Avatar answered May 01 '26 02:05

josliber


It is a simple task for merge:

df12 <- merge(df1, df2, by="code", all.x=TRUE)
df12$trait <- ifelse(is.na(df12$trait.x), df12$trait.y, df12$trait.x)
like image 31
jogo Avatar answered May 01 '26 00:05

jogo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!