Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a limit to the string length that can be passed to grep() in R?

Tags:

r

I have a string with 2735 characters. I would like to search for that string in a character vector. When I run grep()

grep(pattern=substr(string,1,2560), x=myvector)

I get the error :

Error in grep(pattern = substr(string, 1, 2560), x = myvector) : 
  invalid regular expression 'all the characters of my string...'

If I try

grep(pattern=substr(string,1,2559), x=myvector)

I do not get the error.

QUESTION: Is there a limit to string length when passed to grep()? If so, how should I get around it?

like image 358
irritable_phd_syndrome Avatar asked Mar 13 '17 17:03

irritable_phd_syndrome


1 Answers

Hm, looks like you've stumbled upon an undocumented "feature". A workaround is to set perl=TRUE, to use the PCRE library:

pat <- paste(rep("a", 2560), collapse="")
x <- paste0(pat, pat)

grep(pat, x)
#Error in grep(pat, ch) : 
#  invalid regular expression 'aaa....'

grep(pat, x, perl=TRUE)
#[1] 1

I guessed that this would work based on the comment in ?grep:

If you are doing a lot of regular expression matching, including on very long strings, you will want to consider the options used. Generally PCRE will be faster than the default regular expression engine, and fixed = TRUE faster still (especially when each pattern is matched only a few times).

like image 148
Hong Ooi Avatar answered Nov 02 '22 06:11

Hong Ooi