Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl: "Quantifier in {,} bigger than 32766 in regex"

Tags:

regex

perl

Let's say I want to find in a large (300,000 letters) the word "dogs" with the distance between letters exactly 40,000 letters in between. So I do:

$mystring =~ m/d.{40000}o.{40000}g.{40000}s/;

This will work quite well in other (slower) languages but in Perl it throws me "Quantifier in {,} bigger than 32766 in regex".

So:

  1. Can we use a bigger number as the quantifier somehow?
  2. If not, is there another good way to find what I want? Note that "dogs" is only an example; I want to do this for any word and any jump size (and fast).
like image 882
Gadi A Avatar asked May 16 '12 19:05

Gadi A


2 Answers

If you really need to do this fast I would look at a custom search based on the ideas of Boyer-Moore string search. A regular expression is parsed into a finite state machine. Even a clever, compact representation of such a FSM is not going to be a very effective way to execute a search like you describe.

If you really want to continue along the lines you are now you can just concatenate two expressions like .{30000}.{10000} which is the same as .{40000} in practice.

like image 169
Ben Jackson Avatar answered Sep 21 '22 15:09

Ben Jackson


I think index might be better suited for this task. Something along the lines of the completely untested:

sub has_dogs {
    my $str = shift;
    my $start = 0

    while (-1 < (my $pos = index $$str, 'd', $start)) {
        no warnings 'uninitialized';
        if ( ('o' eq substr($$str, $pos +  40_000, 1)) and
             ('g' eq substr($$str, $pos +  80_000, 1)) and
             ('s' eq substr($$str, $pos + 120_000, 1)) ) {
             return 1;
         }
     }
     return;
 }
like image 22
Sinan Ünür Avatar answered Sep 19 '22 15:09

Sinan Ünür