Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I grab multiple lines after a matching line in Perl?

Tags:

perl

I'm parsing a large file in Perl line-by-line (terminated by \n), but when I reach a certain keyword, say "TARGET", I need to grab all the lines between TARGET and the next completely empty line.

So, given a segment of a file:

Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line
\n

It should become:
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

The reason I'm having trouble is I'm already going through the file line-by-line; how do I change what I delimit by midway through the parsing process?

like image 218
Dirk Avatar asked Jun 24 '09 20:06

Dirk


People also ask

How do I match multiple lines in Perl?

Solution. Use /m , /s , or both as pattern modifiers. /s lets . match newline (normally it doesn't). If the string had more than one line in it, then /foo.

How do I match a new line in a regular expression in Perl?

i think this will work,using the /s modifier, which mnemonically means to "treat string as a single line". This changes the behaviour of "." to match newline characters as well. In order to match the beginning of this comment to the end, we add the /s modifier like this: $str =~ s/<!

How do I skip a line in Perl?

To skip over blanks lines in a perl script, you have several choices. You could use a "next if /^$/" (skip if empty) command or a "next if /^\s*$/" skip if empty or only white space.

What is\ b in Perl?

Depending on how it is used, \b can have a special meaning within a Perl command: \b is the backspace character only inside a character class. Outside a character class, \b alone is a word-character/non-word-character boundary.


6 Answers

You want something like this:

my @grabbed;
while (<FILE>) {
    if (/TARGET/) {
        push @grabbed, $_;
        while (<FILE>) {
            last if /^$/;
            push @grabbed, $_;
        }
    }
}
like image 132
dave4420 Avatar answered Oct 21 '22 07:10

dave4420


The range operator is ideal for this sort of task:

$ cat try
#! /usr/bin/perl

while (<DATA>) {
  print if /\btarget\b/i .. /^\s*$/
}

__DATA__
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Nope
Line 7 Target
Linu 8 Yep

Nope again

$ ./try
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Line 7 Target
Linu 8 Yep
like image 44
Greg Bacon Avatar answered Oct 21 '22 08:10

Greg Bacon


The short answer: line delimiter in perl is $/, so when you hit TARGET, you can set $/ to "\n\n", read the next "line", then set it back to "\n"... et voilà!

Now for the longer one: if you use the English module (which gives sensible names to all of Perl's magic variable, then $/ is called $RS or $INPUT_RECORD_SEPARATOR. If you use IO::Handle, then IO::Handle->input_record_separator( "\n\n") will work.

And if you're doing this as part of a bigger piece of code, don't forget to either localize (using local $/; in the appropriate scope) or to set back $/ to its original value of "\n".

like image 37
mirod Avatar answered Oct 21 '22 07:10

mirod


From perlfaq6's answer to How can I pull out lines between two patterns that are themselves on different lines?


You can use Perl's somewhat exotic .. operator (documented in perlop):

perl -ne 'print if /START/ .. /END/' file1 file2 ...

If you wanted text and not lines, you would use

perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...

But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.

Here's another example of using ..:

while (<>) {
    $in_header =   1  .. /^$/;
    $in_body   = /^$/ .. eof;
# now choose between them
} continue {
    $. = 0 if eof;  # fix $.
}
like image 26
brian d foy Avatar answered Oct 21 '22 08:10

brian d foy


while(<FILE>)
{
    if (/target/i)
    {
        $buffer .= $_;
        while(<FILE>)
        {
            $buffer .= $_;
            last if /^\n$/;
        }
    }
}
like image 33
user105033 Avatar answered Oct 21 '22 08:10

user105033


use strict;
use warnings;

my $inside = 0;
my $data = '';
while (<DATA>) {
    $inside = 1 if /Target/;
    last if /^$/ and $inside;
    $data .= $_ if $inside;
}

print '[' . $data . ']';

__DATA__
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Next Line

Edit to fix the exit condition as per the note below.

like image 20
telesphore4 Avatar answered Oct 21 '22 09:10

telesphore4