I have the following list of file names:
files.list <- c("Fasted DWeib NoCmaxW.xlsx", "Fed DWeib NoCmaxW.xlsx", "Fasted SWeib NoCmaxW.xlsx", "Fed SWeib NoCmaxW.xlsx", "Fasted DWeib Cmax10.xlsx", "Fed DWeib Cmax10.xlsx", "Fasted SWeib Cmax10.xlsx", "Fed SWeib Cmax10.xlsx")
I want to identify which files have the following sub-strings:
toMatch <- c("Fasted", "DWeib NoCmaxW")
The examples I have found often quote the following usage:
grep(paste(toMatch, collapse = "|"), files.list, value=TRUE)
However, this returns four possibilities:
[1] "Fasted DWeib NoCmaxW.xlsx" "Fed DWeib NoCmaxW.xlsx" "Fasted SWeib NoCmaxW.xlsx"
[4] "Fasted DWeib Cmax10.xlsx" "Fasted SWeib Cmax10.xlsx"
I want the filename which contains both elements of toMatch (i.e. "Fasted" and "DWeib NoCmaxW"). There is only one file which satisfies that requirement (files.list[1]). I assumed the "|" in the paste command might be a logical OR, and so I tried "&", but that didn't address my problem.
Can someone please help?
Thank you.
Find String Matches in a Vector or Matrix in R Programming – str_detect() Function. str_detect() Function in R Language is used to check if the specified match of the substring exists in the original string. It will return TRUE for a match found otherwise FALSE against each of the element of the Vector or matrix.
If we need to find the location of the required string/pattern, we can use the grep() method. On the other hand, if we just need to know whether the pattern exists or not, we can use the logical function grepl() which returns either True or False based on the result.
%in% operator can be used in R Programming Language, to check for the presence of an element inside a vector. It returns a boolean output, evaluating to TRUE if the element is present, else returns false.
We can use &
i1 <- grepl(toMatch[1], files.list) & grepl(toMatch[2], files.list)
If there are multiple elements in 'toMatch', loop through them with lapply
and Reduce
to a single logical vector
with &
i1 <- Reduce(`&`, lapply(toMatch, grepl, x = files.list))
files.list[i1]
#[1] "Fasted DWeib NoCmaxW.xlsx"
It is also possible to collapse the elements with .*
i.e. to match first word of 'toMatch' followed by a word boundary(\\b
) then some characters (.*
) and another word boundary (\\b
) before the second word of 'toMatch'. In this example it works. May be it is better to add the word boundary at the start and end as well (which is not needed for this example)
pat1 <- paste(toMatch, collapse= "\\b.*\\b")
grep(pat1, files.list, value = TRUE)
#[1] "Fasted DWeib NoCmaxW.xlsx"
But, this will look for matches in the same order of words in 'toMatch'. In case, if have substring in reverse order and want to match those as well, create the pattern
in the reverse order and then collapse with |
pat2 <- paste(rev(toMatch), collapse="\\b.*\\b")
pat <- paste(pat1, pat2, sep="|")
grep(pat, files.list, value = TRUE)
#[1] "Fasted DWeib NoCmaxW.xlsx"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With