I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them 'exists_in_title' and 'exists_in_description') that have either a 1 or 0 indicating if the 3rd column exists in either the 1st or 2nd column. I need it to simply be a 1:1 operation, so for example, calling row 1 'A', I need to check if the cell A3, exists in A1, and use that data to create column exists_in_title, and then check if A3 exists in A2, and use that data to create the column exists_in_description. Then move on to row B and go through the same operation. I have thousands of rows of data so it's not realistic to do these in a 1 at a time fashion, writing individual functions for each row, definitely need a function or method that will run through every row in the table in one shot.
I've played around with grepl, pmatch, str_count but none seem to really do what I need. I think grepl is probably the closest to what I need, here's an example of 2 lines of code I wrote that logically do what I would want them to, but didn't seem to work:
df$exists_in_title <- grepl(df$A3, df$A1)
df$exists_in_description <- grepl(df$A3, df$A2)
However when I run those I get the following message, which leads me to believe it did not work properly: "argument 'pattern' has length > 1 and only the first element will be used"
Any help on how to do this would be greatly appreciated. Thanks!
grepl
will work with mapply
:
Sample data frame:
title <- c('eggs and bacon','sausage biscuit','pancakes')
description <- c('scrambled eggs and thickcut bacon','homemade biscuit with breakfast pattie', 'stack of sourdough pancakes')
keyword <- c('bacon','sausage','sourdough')
df <- data.frame(title, description, keyword, stringsAsFactors=FALSE)
Searching for matches using grepl
:
df$exists_in_title <- mapply(grepl, pattern=df$keyword, x=df$title)
df$exists_in_description <- mapply(grepl, pattern=df$keyword, x=df$description)
And the results:
title description keyword exists_in_title exists_in_description
1 eggs and bacon scrambled eggs and thickcut bacon bacon TRUE TRUE
2 sausage biscuit homemade biscuit with breakfast pattie sausage TRUE FALSE
3 pancakes stack of sourdough pancakes sourdough FALSE TRUE
You could also do this with dplyr
and stringr
:
library(dplyr)
df %>%
rowwise() %>%
mutate(exists_in_title = grepl(keyword, title),
exists_in_description = grepl(keyword, description))
library(stringr)
df %>%
rowwise() %>%
mutate(exists_in_title = str_detect(title, keyword),
exists_in_description = str_detect(description, keyword))
Map
is also an option, or using more from tidyverse
another option could be purrr
with stringr
:
library(tidyverse)
df %>%
mutate(exists_in_title = unlist(Map(function(x, y) grepl(x, y), keyword, title))) %>%
mutate(exists_in_description = map2_lgl(description, keyword, str_detect))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With