Let's say I want to find in a large (300,000 letters) the word "dogs" with the distance between letters exactly 40,000 letters in between. So I do:
$mystring =~ m/d.{40000}o.{40000}g.{40000}s/;
This will work quite well in other (slower) languages but in Perl it throws me "Quantifier in {,} bigger than 32766 in regex".
So:
If you really need to do this fast I would look at a custom search based on the ideas of Boyer-Moore string search. A regular expression is parsed into a finite state machine. Even a clever, compact representation of such a FSM is not going to be a very effective way to execute a search like you describe.
If you really want to continue along the lines you are now you can just concatenate two expressions like .{30000}.{10000}
which is the same as .{40000}
in practice.
I think index might be better suited for this task. Something along the lines of the completely untested:
sub has_dogs {
my $str = shift;
my $start = 0
while (-1 < (my $pos = index $$str, 'd', $start)) {
no warnings 'uninitialized';
if ( ('o' eq substr($$str, $pos + 40_000, 1)) and
('g' eq substr($$str, $pos + 80_000, 1)) and
('s' eq substr($$str, $pos + 120_000, 1)) ) {
return 1;
}
}
return;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With