Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep rows which have two string in the same row

Tags:

r

In a dataframe like this:

df <- data.frame(id = c(1,2,3), text = c("hi my name is E","hi what's your name","name here"))

I would like to keep row which contain both hi and name words in a row. Example of expended output:

df <- data.frame(id = c(1,2,3), text = c("hi my name is E","hi what's your name"))

I try this but it doesn't work properly:

library(tidyverse)
df %>%
    filter(str_detect(text, 'name&hi'))
like image 439
Nathalie Avatar asked Dec 14 '22 08:12

Nathalie


2 Answers

One simple answer and two more complex answers you should really only need if you have more than 2 words to check

library(tidyverse)

df %>% 
  filter(str_detect(text, 'hi') & str_detect(text, 'name'))

df %>% 
  filter(rowSums(outer(text, c('hi', 'name'), str_detect)) == 2)

df %>% 
  filter(reduce(c('hi', 'name'), ~ .x & str_detect(text, .y), .init = TRUE))
like image 106
IceCreamToucan Avatar answered Jan 12 '23 09:01

IceCreamToucan


We can also use regex to specify whether 'hi' follows 'name' or (|) 'name' follows 'hi

library(dplyr)
library(stringr)
df %>% 
     filter(str_detect(text, 'hi\\b.*\\bname|name\\b.*\\bhi'))
like image 25
akrun Avatar answered Jan 12 '23 09:01

akrun