I've been previously working only with bash
regular expressions, grep
, sed
, awk
etc. After trying Perl 6
regexes
I've got an impression that they work slower than I would expect, but probably the reason is that I handle them incorrectly.
I've made a simple test to compare similar operations in Perl 6
and in bash
. Here is the Perl 6
code:
my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5
my @search = <abcde cdeff fabcd>;
my token search {
@search
}
my @new_array = @array.grep({/ <search> /});
say @new_array;
Then I printed @array
into a file named array
(with 7776 lines), made a file named search
with 3 lines (abcde
, cdeff
, fabcd
) and made a simple grep
search.
$ grep -f search array
After both programs produced the same result, as expected, I measured the time they were working.
$ time perl6 search.p6
real 0m6,683s
user 0m6,724s
sys 0m0,044s
$ time grep -f search array
real 0m0,009s
user 0m0,008s
sys 0m0,000s
So, what am I doing wrong in my Perl 6 code?
UPD: If I pass the search tokens to grep
, looping through the @search
array, the program works much faster:
my @array = "aaaaa" .. "fffff";
say +@array;
my @search = <abcde cdeff fabcd>;
for @search -> $token {
say [email protected]({/$token/});
}
$ time perl6 search.p6
real 0m1,378s
user 0m1,400s
sys 0m0,052s
And if I define each search pattern manually, it works even faster:
my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5
say [email protected]({/abcde/});
say [email protected]({/cdeff/});
say [email protected]({/fabcd/});
$ time perl6 search.p6
real 0m0,587s
user 0m0,632s
sys 0m0,036s
The grep
command is much simpler than Perl 6's regular expressions, and it has had many more years to get optimized. It is also one of the areas that hasn't seen as much optimizing in Rakudo; partly because it is seen as being a difficult thing to work on.
For a more performant example, you could pre-compile the regex:
my $search = "/@search.join('|')/".EVAL;
# $search = /abcde|cdeff|fabcd/;
say [email protected]($search);
That change causes it to run in about half a second.
If there is any chance of malicious data in @search
, and you have to do this it may be safer to use:
"/@search».Str».perl.join('|')/".EVAL
The compiler can't quite generate that optimized code for /@search/
as @search
could change after the regex gets compiled. What could happen is that the first time the regex is used it gets re-compiled into the better form, and then cache it as long as @search
doesn't get modified.
(I think Perl 5 does something similar)
One important fact you have to keep in mind is that a regex in Perl 6 is just a method that is written in a domain specific sub-language.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With