Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternation in regexes seems to be terribly slow in big files

Tags:

regex

raku

I am trying to use this regex:

my @vulnerabilities = ($g ~~ m:g/\s+("Low"||"Medium"||"High")\s+/);

On chunks of files such as this one, the chunks that go from one "sorted" to the next. Every one must be a few hundred kilobytes, and all of them together take from 1 to 3 seconds all together (divided by 32 per iteration).

How can this be sped up?

like image 906
jjmerelo Avatar asked Apr 10 '20 12:04

jjmerelo


1 Answers

Inspection of the example file reveals that the strings only occur as a whole line, starting with a tab and a space. From your responses I further gathered that you're really only interested in counts. If that is the case, then I would suggest something like this solution:

my %targets = "\t Low", "Low", "\t Medium", "Medium", "\t High", "High";
my %vulnerabilities is Bag = $g.lines.map: {
    %targets{$_} // Empty
}
dd %vulnerabilities;  # ("Low"=>2877,"Medium"=>54).Bag

This runs in about .25 seconds on my machine.

It always pays to look at the problem domain thoroughly!

like image 129
Elizabeth Mattijsen Avatar answered Oct 24 '22 00:10

Elizabeth Mattijsen