Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find the first occurrence of a pattern in a string from some starting position?

I have a string of arbitrary length, and starting at position p0, I need to find the first occurrence of one of three 3-letter patterns.

Assume the string contain only letters. I need to find the count of triplets starting at position p0 and jumping forward in triplets until the first occurrence of either 'aaa' or 'bbb' or 'ccc'.

Is this even possible using just a regex?

like image 214
slashmais Avatar asked Sep 23 '08 09:09

slashmais


3 Answers

Moritz says this might be faster than a regex. Even if it's a little slower, it's easier to understand at 5 am. :)

             #0123456789.123456789.123456789.  
my $string = "alsdhfaaasccclaaaagalkfgblkgbklfs";  
my $pos    = 9;  
my $length = 3;  
my $regex  = qr/^(aaa|bbb|ccc)/;

while( $pos < length $string )    
    {  
    print "Checking $pos\n";  

    if( substr( $string, $pos, $length ) =~ /$regex/ )
        {
        print "Found $1 at $pos\n";
        last;
        }

    $pos += $length;
    }
like image 110
brian d foy Avatar answered Sep 21 '22 21:09

brian d foy


$string=~/^   # from the start of the string
            (?:.{$p0}) # skip (don't capture) "$p0" occurrences of any character
            (?:...)*?  # skip 3 characters at a time,
                       # as few times as possible (non-greedy)
            (aaa|bbb|ccc) # capture aaa or bbb or ccc as $1
         /x;

(Assuming p0 is 0-based).

Of course, it's probably more efficient to use substr on the string to skip forward:

substr($string, $p0)=~/^(?:...)*?(aaa|bbb|ccc)/;
like image 37
Mike G. Avatar answered Sep 24 '22 21:09

Mike G.


You can't really count with regexes, but you can do something like this:

pos $string = $start_from;
$string =~ m/\G         # anchor to previous pos()
            ((?:...)*?) # capture everything up to the match
            (aaa|bbb|ccc)
            /xs  or die "No match"
my $result = length($1) / 3;

But I think it's a bit faster to use substr() and unpack() to split into triple and walk the triples in a for-loop.

(edit: it's length(), not lenght() ;-)

like image 38
moritz Avatar answered Sep 24 '22 21:09

moritz