I am using the following to count the number of occurrences of a pattern in a file:
my @lines = grep /$text/, <$fp>;
print ($#lines + 1);
But sometimes it prints one more than the actual value. I checked and it is because the last element of @lines
is null, and that is also counted.
How can the last element of the grep result be empty sometimes? Also, how can this issue be resolved?
It really depends a lot on your pattern, but one thing you could do is join a couple of matches, the first one disqualifying any line that contains only space (or nothing). This example will reject any line that is either empty, newline only, or any amount of whitespace only.
my @lines = grep { not /^\s*$/ and /$test/ } <$fp>;
Keep in mind that if the contents of $test happen to include regexp special metacharacters they either need to be intended for their metacharacter purposes, or sterilized with quotemeta()
.
My theories are that you might have a line terminated in \n which is somehow matching your $text regexp, or your $text regexp contains metacharacters in it that are affecting the match without you being aware. Either way, the snippet I provided will at least force rejection of "blank lines", where blank could mean completely empty (unlikely), newline terminated but otherwise empty (probable), or whitespace containing (possible) lines that appear blank when printed.
A regular expression that matches the empty string will match undef
. Perl will warn about doing so, but casts undef
to ''
before trying to match against it, at which point grep
will quite happily promote the undef
to its results. If you don't want to pick up the empty string (or anything that will be matched as though it were the empty string), you need to rewrite your regular expression to not match it.
To accurately see what is in lines, do:
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \@lines;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With