Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check row-wise if element exists in comma-separated column with position

Tags:

r

Given this data.frame:

#    x     y
# 1  a b,c,d
# 2  c b,c,d
# 3  c b,c,d
# 4  a e,f,g
# 5  a b,c,d
# 6  c a,b,c
# 7  b b,c,d
# 8  c  <NA>
# 9  c e,f,g
# 10 a  <NA>

My desired output is:

#    x     y pos contains
# 1  a b,c,d  NA    FALSE
# 2  c b,c,d   2     TRUE
# 3  c b,c,d   2     TRUE
# 4  a e,f,g  NA    FALSE
# 5  a b,c,d  NA    FALSE
# 6  c a,b,c   3     TRUE
# 7  b b,c,d   1     TRUE
# 8  c  <NA>  NA       NA
# 9  c e,f,g  NA    FALSE
# 10 a  <NA>  NA       NA

That is, check (by row) if df$x is in df$y and give its position. I started down the strsplit(df$y, ",") path but things got complicated quickly and I know there's a simple solution.


Code to reproduce:
set.seed(5)
seq_letters <- c("a,b,c", "b,c,d", "e,f,g", NA)
df <- data.frame(x = sample(letters[1:3], 10, TRUE),
                 y = sample(seq_letters, 10, TRUE),
                 stringsAsFactors = FALSE)

like image 406
JasonAizkalns Avatar asked Feb 09 '23 11:02

JasonAizkalns


1 Answers

Here's a possibility using match() with mapply() to find the first column, after splitting the y column into pieces. Then we can build the second column based on that.

df$pos <- mapply(match, df$x, strsplit(df$y, ",", fixed = TRUE), USE.NAMES = FALSE)
df$contains <- replace(!is.na(df$pos), is.na(df$y), NA)

which gives

   x     y pos contains
1  a b,c,d  NA    FALSE
2  c b,c,d   2     TRUE
3  c b,c,d   2     TRUE
4  a e,f,g  NA    FALSE
5  a b,c,d  NA    FALSE
6  c a,b,c   3     TRUE
7  b b,c,d   1     TRUE
8  c  <NA>  NA       NA
9  c e,f,g  NA    FALSE
10 a  <NA>  NA       NA
like image 104
Rich Scriven Avatar answered Feb 10 '23 23:02

Rich Scriven