Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Find all vector elements that contain all strings / patterns - str_detect grep

Sample data

files.in.path = c("a.4.0. name 2015 - NY.RDS", 
                  "b.4.0. name 2016 - CA.RDS", 
                  "c.4.0. name 2015 - PA.RDS")
strings.to.find = c("4.0", "PA")

I want the logical vector that shows all elements that contain all strings.to.find. The result wanted:

FALSE FALSE TRUE

This code will find elements that contain any one of the strings.to.find, i.e., uses an OR operator

str_detect(files.in.path, str_c(strings.to.find, collapse="|")) # OR operator
 TRUE TRUE TRUE

This code attempts to use an AND operator but does not work.

str_detect(files.in.path, str_c(strings.to.find, collapse="&")) # AND operator
FALSE FALSE FALSE

This works in several lines and I can write a for loop that will generate all the individual lines for cases with a larger number of strings.to.find

det.1 = str_detect(files.in.path,      "4.0"  )   
det.2 = str_detect(files.in.path,      "PA"  )   
det.all = det.1 & det.2
 FALSE FALSE  TRUE

But is there a better way that does not involve using regex that depend on the position or order of the strings.to.find.

like image 924
LWRMS Avatar asked Feb 06 '23 04:02

LWRMS


2 Answers

This is not for heavy lifting, but str_detect is vectorized over both string and pattern, so you can combine it with outer function to get something close:

library(stringr)
outer(files.in.path, strings.to.find, str_detect)

#     [,1]  [,2]
#[1,] TRUE FALSE
#[2,] TRUE FALSE
#[3,] TRUE  TRUE

To check if all patterns exist in a string, apply the all logical operator per row of the resulting matrix:

apply(outer(files.in.path, strings.to.find, str_detect), 1, all)

#[1] FALSE FALSE  TRUE

Or as per @Jota commented, stri_detect_fixed will be safer to use here if the pattern you are looking at should be exactly matched:

library(stringi)
apply(outer(files.in.path, strings.to.find, stri_detect_fixed), 1, all)
# [1] FALSE FALSE  TRUE
like image 57
Psidom Avatar answered Feb 09 '23 01:02

Psidom


A search of the web for either 'r regex "and operaror"' or 'regex "and operator"' lead to R grep: is there an AND operator?, and Regular Expressions: Is there an AND operator? respectively.

So to match both patterns concatenate the strings together

str <- paste0("(?=.*", strings.to.find,")", collapse="") 
grepl(str, files.in.path, perl=TRUE)

As Jota mentioned in comment by matching "4.0" this will also match other stings as the period is a metacharacter. One fix is to escape the period in your pattern string ie strings.to.find = c( "PA", "4\\.0")

like image 27
user2957945 Avatar answered Feb 09 '23 01:02

user2957945