I have a string with 2735 characters. I would like to search for that string in a character vector. When I run grep()
grep(pattern=substr(string,1,2560), x=myvector)
I get the error :
Error in grep(pattern = substr(string, 1, 2560), x = myvector) :
invalid regular expression 'all the characters of my string...'
If I try
grep(pattern=substr(string,1,2559), x=myvector)
I do not get the error.
QUESTION: Is there a limit to string length when passed to grep()? If so, how should I get around it?
Hm, looks like you've stumbled upon an undocumented "feature". A workaround is to set perl=TRUE
, to use the PCRE library:
pat <- paste(rep("a", 2560), collapse="")
x <- paste0(pat, pat)
grep(pat, x)
#Error in grep(pat, ch) :
# invalid regular expression 'aaa....'
grep(pat, x, perl=TRUE)
#[1] 1
I guessed that this would work based on the comment in ?grep
:
If you are doing a lot of regular expression matching, including on very long strings, you will want to consider the options used. Generally PCRE will be faster than the default regular expression engine, and fixed = TRUE faster still (especially when each pattern is matched only a few times).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With