I wrote a functioning for loop, but it's slow over thousands of rows and I'm looking for more efficient alternative. Thanks in advance!
The task:
a matches column b, column d becomes NA. a does not match b, but b matches c, then column e becomes
NA.The for loop:
for (i in 1:nrow(data)) {
if (data$a[i] == data$b[i]) {data$d[i] <- NA}
if (!(data$a[i] == data$b[i]) & data$b[i] == data$c[i])
{data$e[i] <- NA}
}
An example:
a b c d e
F G G 1 10
F G F 5 10
F F F 2 8
Would become:
a b c d e
F G G 1 NA
F G F 5 10
F F F NA 8
If you're concerned about speed and efficiency, I'd recommend data.table (though technically vectorizing a normal data.frame as recommended by @parfait would probably speed things up more than enough)
library(data.table)
DT <- fread("a b c d e
F G G 1 10
F G F 5 10
F F F 2 8")
print(DT)
# a b c d e
# 1: F G G 1 10
# 2: F G F 5 10
# 3: F F F 2 8
DT[a == b, d := NA]
DT[!a == b & b == c, e := NA]
print(DT)
# a b c d e
# 1: F G G 1 NA
# 2: F G F 5 10
# 3: F F F NA 8
Suppose df is your data then:
ab <- with(df, a==b)
bc <- with(df, b==c)
df$d[ab] <- NA
df$e[!ab & bc] <- NA
which would result in
# a b c d e
# 1 F G G 1 NA
# 2 F G F 5 10
# 3 F F F NA 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With