Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I quickly find the first line of a file that matches a regex?

Tags:

regex

grep

sed

perl

I want to search for a line in a file, using regex, inside a Perl script.

Assuming it is in a system with grep installed, is it better to:

  • call the external grep through an open() command
  • open() the file directly and use a while loop and an if ($line =~ m/regex/)?
like image 306
lamcro Avatar asked Dec 13 '08 14:12

lamcro


3 Answers

In a modern Perl implementation, the regexp code should be just as fast as in grep, but if you're concerned about performance, why don't you simply try it out? From a code cleanliness and robustness standpoint, calling an external command line tool is definitely not good.

like image 198
Michael Borgwardt Avatar answered Sep 17 '22 19:09

Michael Borgwardt


You don't need to open the file explicitly.

my $regex = qr/blah/;
while (<>) {
  if (/$regex/) {
    print;
    exit;
  }
}
print "Not found\n";

Since you seem concerned about performance, I let the match and print use the default $_ provided by not assigning <> to anything, which is marginally faster. In normal production code,

while (my $line = <>) {
  if ($line =~ /$regex/) {
    print $line;
    exit;
  }
}

would be preferred.

Edit: This assumes that the file to check is given on the command line, which I just noticed you had not stated applies in your case.

like image 34
Dave Sherohman Avatar answered Sep 21 '22 19:09

Dave Sherohman


One thing to be careful of with grep: In recent Linux distributions, if your LANG environment variable defines a UTF-8 type (e.g. mine is LANG=en_GB.UTF-8) then grep, sed, sort and probably a bunch of other text-processing utilities run about 10 times more slowly. So watch out for that if you are doing performance comparisons. I alias my grep command now to:

LANG= LANGUAGE= /bin/grep

Edit: Actually, it's more like 100 times more slowly

like image 35
Adrian Pronk Avatar answered Sep 18 '22 19:09

Adrian Pronk