I have to find all the positions of matching strings within a larger string using a while loop, and as a second method using a foreach loop. I have figured out the while loop method, but I am stuck on a foreach method. Here is the 'while' method:
....
my $sequence =
'AACAAATTGAAACAATAAACAGAAACAAAAATGGATGCGATCAAGAAAAAGATGC'.
'AGGCGATGAAAATCGAGAAGGATAACGCTCTCGATCGAGCCGATGCCGCGGAAGA'.
'AAAAGTACGTCAAATGACGGAAAAGTTGGAACGAATCGAGGAAGAACTACGTGAT'.
'ACCCAGAAAAAGATGATGCNAACTGAAAATGATTTAGATAAAGCACAGGAAGATT'.
'TATCTGTTGCAAATACCAACTTGGAAGATAAGGAAAAGAAAGTTCAAGAGGCGGA'.
'GGCTGAGGTAGCANCCCTGAATCGTCGTATGACACTTCTGGAAGAGGAATTGGAA'.
'CGAGCTGAGGAACGTTTGAAGATTGCAACGGATAAATTGGAAGAAGCAACACATA'.
'CAGCTGATGAATCTGAACGTGTTCGCNAGGTTATGGAAA';
my $string = <STDIN>;
chomp $string;
while ($sequence =~ /$string/gi )
{
printf "Sequence found at position: %d\n", pos($sequence)- length($string);
}
Here is my foreach method:
foreach ($sequence =~ /$string/gi )
printf "Sequence found at position: %d\n", pos($sequence) - length($string);
}
Could someone please give me a clue on why it doesn't work the same way? Thanks!
My Output if I input "aaca":
Part 1 using a while loop
Sequence found at position: 0
Sequence found at position: 10
Sequence found at position: 17
Sequence found at position: 23
Sequence found at position: 377
Part 2 using a foreach loop
Sequence found at position: -4
Sequence found at position: -4
Sequence found at position: -4
Sequence found at position: -4
Sequence found at position: -4
Your problem here is context. In the while
loop, the condition is in scalar context. In scalar context, the match operator in g
mode will sequentially match along the string. Thus checking pos
within the loop does what you want.
In the foreach
loop, the condition is in list context. In list context, the match operator in g
mode will return a list of all matches (and it will calculate all of the matches before the loop body is ever entered). foreach
is then loading the matches one by one into $_
for you, but you are never using the variable. pos
in the body of the loop is not useful as it contains the result after the matches have ended.
The takeaway here is that if you want pos
to work, and you are using the g
modifier, you should use the while
loop which imposes scalar context and makes the regex iterate across the matches in the string.
Sinan inspired me to write a few foreach
examples:
This one is fairly succinct using split
in separator retention mode:
my $pos = 0;
foreach (split /($string)/i => $sequence) {
print "Sequence found at position: $pos\n" if lc eq lc $string;
$pos += length;
}
A regex equivalent of the split
solution:
my $pos = 0;
foreach ($sequence =~ /(\Q$string\E|(?:(?!\Q$string\E).)+)/gi) {
print "Sequence found at position: $pos\n" if lc eq lc $string;
$pos += length;
}
But this is clearly the best solution for your problem:
{package Dumb::Homework;
sub TIEARRAY {
bless {
haystack => $_[1],
needle => $_[2],
size => 2**31-1,
pos => [],
}
}
sub FETCH {
my ($self, $index) = @_;
my ($pos, $needle) = @$self{qw(pos needle)};
return $$pos[$index] if $index < @$pos;
while ($index + 1 >= @$pos) {
unless ($$self{haystack} =~ /\Q$needle/gi) {
$$self{size} = @$pos;
last
}
push @$pos, pos ($$self{haystack}) - length $needle;
}
$$pos[$index]
}
sub FETCHSIZE {$_[0]{size}}
}
tie my @pos, 'Dumb::Homework' => $sequence, $string;
print "Sequence found at position: $_\n" foreach @pos; # look how clean it is
The reason its the best is because the other two solutions have to process the entire global match first, before you ever see a result. For large inputs (like DNA) that could be a problem. The Dumb::Homework
package implements an array that will lazily find the next position each time the foreach
iterator asks for it. It will even store the positions so you can get to them again without reprocessing. (In truth it looks one match past the requested match, this allows it to end properly in the foreach
, but still much better than processing the whole list)
Actually, the best solution is still to not use foreach
as it is not the correct tool for the job.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With