I just did some benchmarking while trying to optimise some code and observed that strsplit
with perl=TRUE
is faster than running strsplit
with perl=FALSE
. For example,
set.seed(1)
ff <- function() paste(sample(10), collapse= " ")
xx <- replicate(1e5, ff())
system.time(t1 <- strsplit(xx, "[ ]"))
# user system elapsed
# 1.246 0.002 1.268
system.time(t2 <- strsplit(xx, "[ ]", perl=TRUE))
# user system elapsed
# 0.389 0.001 0.392
identical(t1, t2)
# [1] TRUE
So my question (or rather a variation of the question in the title) is, under what circumstances would be absolutely need perl=FALSE
(leaving out the fixed
and useBytes
parameters)? In other words, what can't we do using perl=TRUE
that can be done by setting perl=FALSE
?
from the documentation ;)
Performance considerations
If you are doing a lot of regular expression matching, including on very long strings, you will want to consider the options used. Generally PCRE will be faster than the default regular expression engine, and fixed = TRUE faster still (especially when each pattern is matched only a few times).
Of course, this does not answer the question of "are there any dangers to always using perl=TRUE
"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With