R's grepl() to find multiple strings exists [duplicate]

Tags:

r

grepl

grepl("instance|percentage", labelTest$Text)

will return true if any one of instance or percentage is present.

How will I get true only when both the terms are present?

318

asked May 24 '17 08:05

toofrellik

2 Answers

Text <- c("instance", "percentage", "n", 
          "instance percentage", "percentage instance")

grepl("instance|percentage", Text)
# TRUE  TRUE FALSE  TRUE  TRUE

grepl("instance.*percentage|percentage.*instance", Text)
# FALSE FALSE FALSE TRUE  TRUE

The latter one works by looking for:

('instance')(any character sequence)('percentage')  
OR  
('percentage')(any character sequence)('instance')

Naturally if you need to find any combination of more than two words, this will get pretty complicated. Then the solution mentioned in the comments would be easier to implement and read.

Another alternative that might be relevant when matching many words is to use positive look-ahead (can be thought of as a 'non-consuming' match). For this you have to activate perl regex.

# create a vector of word combinations
set.seed(1)
words <- c("instance", "percentage", "element",
           "character", "n", "o", "p")
Text2 <- replicate(10, paste(sample(words, 5), collapse=" "))

# grepl with multiple positive look-ahead
longperl <- grepl("(?=.*instance)(?=.*percentage)(?=.*element)(?=.*character)",
  Text2, perl=TRUE)

# this is equivalent to the solution proposed in the comments
longstrd <- grepl("instance", Text2) & 
          grepl("percentage", Text2) & 
             grepl("element", Text2) & 
           grepl("character", Text2)

# they produce identical results
identical(longperl, longstrd)

Furthermore, if you have the patterns stored in a vector you can condense the expressions significantly, giving you

pat <- c("instance", "percentage", "element", "character")

longperl <- grepl(paste0("(?=.*", pat, ")", collapse=""), Text2, perl=TRUE)
longstrd <- rowSums(sapply(pat, grepl, Text2) - 1L) == 0L

As asked for in the comments, if you want to match on exact words, i.e. not match on substrings, we can specify word boundaries using \\b. E.g:

tx <- c("cent element", "percentage element", "element cent", "element centimetre")

grepl("(?=.*\\bcent\\b)(?=.*element)", tx, perl=TRUE)
# TRUE FALSE  TRUE FALSE
grepl("element", tx) & grepl("\\bcent\\b", tx)
# TRUE FALSE  TRUE FALSE

104

answered Oct 08 '22 19:10

AkselA

This is how you will get only "TRUE" if both terms do occur in an item of the vector "labelTest$Text". I think this is the exact answer to the question and much shorter than the other solutions.

grepl("instance",labelTest$Text) & grepl("percentage",labelTest$Text)

answered Oct 08 '22 19:10

Sebastian Geschonke

Related questions
                            
                                Prevent long x-axis ticklabels from being cut off in bar charts with plotly in R
                            
                                Shinydashboard and shinytheme?
                            
                                ifelse with no else
                            
                                Is possible to use fwrite from `data.table` with gzfile?
                            
                                Plotly R: setting the spacing between axis label and axis ticks labels
                            
                                Knitr: print text from code block as R markdown
                            
                                Convert date to character in particular format In R
                            
                                R array manipulation
                            
                                Vector vs. Data frame in R
                            
                                Faster way to split a string and count characters using R?
                            
                                Replace nth line in a text file
                            
                                Converting from a character to a numeric data frame
                            
                                Fastest way for multiplying a matrix to a vector
                            
                                Install the package that has been removed from the CRAN repository easily
                            
                                Can ggplot theme formatting be saved as an object?
                            
                                R: reducing colour saturation of a colour palette
                            
                                How do I run a function every second
                            
                                RStudio installation failure under Debian sid: libgstreamer dependency problems
                            
                                How to force specific order of the variables on the X axis?
                            
                                How do I get RSS from a linear model output

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With