I'm parsing a large file in Perl line-by-line (terminated by \n), but when I reach a certain keyword, say "TARGET", I need to grab all the lines between TARGET and the next completely empty line.
So, given a segment of a file:
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line
\n
It should become:
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line
The reason I'm having trouble is I'm already going through the file line-by-line; how do I change what I delimit by midway through the parsing process?
Solution. Use /m , /s , or both as pattern modifiers. /s lets . match newline (normally it doesn't). If the string had more than one line in it, then /foo.
i think this will work,using the /s modifier, which mnemonically means to "treat string as a single line". This changes the behaviour of "." to match newline characters as well. In order to match the beginning of this comment to the end, we add the /s modifier like this: $str =~ s/<!
To skip over blanks lines in a perl script, you have several choices. You could use a "next if /^$/" (skip if empty) command or a "next if /^\s*$/" skip if empty or only white space.
Depending on how it is used, \b can have a special meaning within a Perl command: \b is the backspace character only inside a character class. Outside a character class, \b alone is a word-character/non-word-character boundary.
You want something like this:
my @grabbed;
while (<FILE>) {
if (/TARGET/) {
push @grabbed, $_;
while (<FILE>) {
last if /^$/;
push @grabbed, $_;
}
}
}
The range operator is ideal for this sort of task:
$ cat try
#! /usr/bin/perl
while (<DATA>) {
print if /\btarget\b/i .. /^\s*$/
}
__DATA__
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line
Nope
Line 7 Target
Linu 8 Yep
Nope again
$ ./try
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line
Line 7 Target
Linu 8 Yep
The short answer: line delimiter in perl is $/
, so when you hit TARGET, you can set $/
to "\n\n"
, read the next "line", then set it back to "\n"... et voilà!
Now for the longer one: if you use the English
module (which gives sensible names to all of Perl's magic variable, then $/
is called $RS
or $INPUT_RECORD_SEPARATOR
. If you use IO::Handle
, then IO::Handle->input_record_separator( "\n\n")
will work.
And if you're doing this as part of a bigger piece of code, don't forget to either localize (using local $/;
in the appropriate scope) or to set back $/
to its original value of "\n"
.
From perlfaq6's answer to How can I pull out lines between two patterns that are themselves on different lines?
You can use Perl's somewhat exotic .. operator (documented in perlop):
perl -ne 'print if /START/ .. /END/' file1 file2 ...
If you wanted text and not lines, you would use
perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.
Here's another example of using ..:
while (<>) {
$in_header = 1 .. /^$/;
$in_body = /^$/ .. eof;
# now choose between them
} continue {
$. = 0 if eof; # fix $.
}
while(<FILE>)
{
if (/target/i)
{
$buffer .= $_;
while(<FILE>)
{
$buffer .= $_;
last if /^\n$/;
}
}
}
use strict;
use warnings;
my $inside = 0;
my $data = '';
while (<DATA>) {
$inside = 1 if /Target/;
last if /^$/ and $inside;
$data .= $_ if $inside;
}
print '[' . $data . ']';
__DATA__
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line
Next Line
Edit to fix the exit condition as per the note below.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With