Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search for a string in one column in other columns of a data frame

I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them 'exists_in_title' and 'exists_in_description') that have either a 1 or 0 indicating if the 3rd column exists in either the 1st or 2nd column. I need it to simply be a 1:1 operation, so for example, calling row 1 'A', I need to check if the cell A3, exists in A1, and use that data to create column exists_in_title, and then check if A3 exists in A2, and use that data to create the column exists_in_description. Then move on to row B and go through the same operation. I have thousands of rows of data so it's not realistic to do these in a 1 at a time fashion, writing individual functions for each row, definitely need a function or method that will run through every row in the table in one shot.

I've played around with grepl, pmatch, str_count but none seem to really do what I need. I think grepl is probably the closest to what I need, here's an example of 2 lines of code I wrote that logically do what I would want them to, but didn't seem to work:

df$exists_in_title <- grepl(df$A3, df$A1)

df$exists_in_description <- grepl(df$A3, df$A2)

However when I run those I get the following message, which leads me to believe it did not work properly: "argument 'pattern' has length > 1 and only the first element will be used"

Any help on how to do this would be greatly appreciated. Thanks!

like image 254
dnoss Avatar asked Jun 11 '15 13:06

dnoss


1 Answers

grepl will work with mapply:

Sample data frame:

title <- c('eggs and bacon','sausage biscuit','pancakes')
description <- c('scrambled eggs and thickcut bacon','homemade biscuit with breakfast pattie', 'stack of sourdough pancakes')
keyword <- c('bacon','sausage','sourdough')
df <- data.frame(title, description, keyword, stringsAsFactors=FALSE)

Searching for matches using grepl:

df$exists_in_title <- mapply(grepl, pattern=df$keyword, x=df$title)
df$exists_in_description <- mapply(grepl, pattern=df$keyword, x=df$description)

And the results:

            title                            description   keyword exists_in_title exists_in_description
1  eggs and bacon      scrambled eggs and thickcut bacon     bacon            TRUE                  TRUE
2 sausage biscuit homemade biscuit with breakfast pattie   sausage            TRUE                 FALSE
3        pancakes            stack of sourdough pancakes sourdough           FALSE                  TRUE

Update I

You could also do this with dplyr and stringr:

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(exists_in_title = grepl(keyword, title),
         exists_in_description = grepl(keyword, description))

library(stringr)
df %>% 
  rowwise() %>% 
  mutate(exists_in_title = str_detect(title, keyword),
         exists_in_description = str_detect(description, keyword))   

Update II

Mapis also an option, or using more from tidyverse another option could be purrr with stringr:

library(tidyverse)
df %>%
  mutate(exists_in_title = unlist(Map(function(x, y) grepl(x, y), keyword, title))) %>% 
  mutate(exists_in_description = map2_lgl(description, keyword,  str_detect))
like image 155
sbha Avatar answered Oct 04 '22 23:10

sbha