Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem with perl multiline matching

Tags:

I'm trying to use a perl one-liner to update some code that spans multiple lines and am seeing some strange behavior. Here's a simple text file that shows the problem I'm seeing:

ABCD    START          STOP    EFGH 

I expected the following to work but it doesn't end up replacing anything:

perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt 

After doing some experimenting I found that the \s+ in the original regex will match the newline but not any of the whitespace on the 2nd line, and adding a second \s+ doesn't work either. So for now I'm doing the following workaround, which is to add an intermediate regex that only removes the newline:

perl -pi -e 's/START\s+/START/s' input.txt 

This creates the following intermediate file:

ABCD    START            STOP    EFGH 

Then I can run the original regex (although the /s is no longer needed):

perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt 

This creates the final, desired file:

ABCD    REPLACE    EFGH 

It seems like the intermediate step should not be necessary. Am I missing something?

like image 720
faman Avatar asked May 02 '11 21:05

faman


People also ask

How do I match multiple lines in Perl?

Solution. Use /m , /s , or both as pattern modifiers. /s lets . match newline (normally it doesn't). If the string had more than one line in it, then /foo.

How do I match a new line in a regular expression in Perl?

i think this will work,using the /s modifier, which mnemonically means to "treat string as a single line". This changes the behaviour of "." to match newline characters as well. In order to match the beginning of this comment to the end, we add the /s modifier like this: $str =~ s/<!

What is multiline in regex?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

What is multiline flag in regex?

The m flag indicates that a multiline input string should be treated as multiple lines. For example, if m is used, ^ and $ change from matching at only the start or end of the entire string to the start or end of any line within the string. You cannot change this property directly.


2 Answers

perl -p processes the file one line at a time. The regex you have is correct, but it is never matched against the multi-line string.

A simple strategy, assuming the file will fit in memory, is to read the whole thing (do this without -p):

$/ = undef; $file = <>; $file =~ s/START\s+STOP/REPLACE/sg; print $file; 

Note, I have added the /g modifier to specify global replacement.

As a shortcut for all that extra boilerplate, you can use your existing script with the -0777 option: perl -0777pi -e 's/START\s+STOP/REPLACE/sg'. Adding /g is still needed if you may need to make multiple replacements within the file.

A hiccup that you might run into, although not with this regex: if the regex were START.+STOP, and a file contains multiple START/STOP pairs, greedy matching of .+ will eat everything from the first START to the last STOP. You can use non-greedy matching (match as little as possible) with .+?.

If you want to use the ^ and $ anchors for line boundaries anywhere in the string, then you also need the /m regex modifier.

like image 158
Andy Avatar answered Oct 22 '22 04:10

Andy


You were close. You need either -00 or -0777:

 perl -0777 -pi -e 's/START\s+/START/' input.txt 
like image 25
tchrist Avatar answered Oct 22 '22 02:10

tchrist