Given this data.frame
:
# x y
# 1 a b,c,d
# 2 c b,c,d
# 3 c b,c,d
# 4 a e,f,g
# 5 a b,c,d
# 6 c a,b,c
# 7 b b,c,d
# 8 c <NA>
# 9 c e,f,g
# 10 a <NA>
My desired output is:
# x y pos contains
# 1 a b,c,d NA FALSE
# 2 c b,c,d 2 TRUE
# 3 c b,c,d 2 TRUE
# 4 a e,f,g NA FALSE
# 5 a b,c,d NA FALSE
# 6 c a,b,c 3 TRUE
# 7 b b,c,d 1 TRUE
# 8 c <NA> NA NA
# 9 c e,f,g NA FALSE
# 10 a <NA> NA NA
That is, check (by row) if df$x
is in df$y
and give its position. I started down the strsplit(df$y, ",")
path but things got complicated quickly and I know there's a simple solution.
set.seed(5)
seq_letters <- c("a,b,c", "b,c,d", "e,f,g", NA)
df <- data.frame(x = sample(letters[1:3], 10, TRUE),
y = sample(seq_letters, 10, TRUE),
stringsAsFactors = FALSE)
Here's a possibility using match()
with mapply()
to find the first column, after splitting the y
column into pieces. Then we can build the second column based on that.
df$pos <- mapply(match, df$x, strsplit(df$y, ",", fixed = TRUE), USE.NAMES = FALSE)
df$contains <- replace(!is.na(df$pos), is.na(df$y), NA)
which gives
x y pos contains
1 a b,c,d NA FALSE
2 c b,c,d 2 TRUE
3 c b,c,d 2 TRUE
4 a e,f,g NA FALSE
5 a b,c,d NA FALSE
6 c a,b,c 3 TRUE
7 b b,c,d 1 TRUE
8 c <NA> NA NA
9 c e,f,g NA FALSE
10 a <NA> NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With