Problem with perl multiline matching

Q: What is multiline in regex?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

Q: What is multiline flag in regex?

The m flag indicates that a multiline input string should be treated as multiple lines. For example, if m is used, ^ and $ change from matching at only the start or end of the entire string to the start or end of any line within the string. You cannot change this property directly.

Tags:

I'm trying to use a perl one-liner to update some code that spans multiple lines and am seeing some strange behavior. Here's a simple text file that shows the problem I'm seeing:

ABCD    START          STOP    EFGH

I expected the following to work but it doesn't end up replacing anything:

perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt

After doing some experimenting I found that the \s+ in the original regex will match the newline but not any of the whitespace on the 2nd line, and adding a second \s+ doesn't work either. So for now I'm doing the following workaround, which is to add an intermediate regex that only removes the newline:

perl -pi -e 's/START\s+/START/s' input.txt

This creates the following intermediate file:

ABCD    START            STOP    EFGH

Then I can run the original regex (although the /s is no longer needed):

perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt

This creates the final, desired file:

ABCD    REPLACE    EFGH

It seems like the intermediate step should not be necessary. Am I missing something?

720

asked May 02 '11 21:05

faman

2 Answers

perl -p processes the file one line at a time. The regex you have is correct, but it is never matched against the multi-line string.

A simple strategy, assuming the file will fit in memory, is to read the whole thing (do this without -p):

$/ = undef; $file = <>; $file =~ s/START\s+STOP/REPLACE/sg; print $file;

Note, I have added the /g modifier to specify global replacement.

As a shortcut for all that extra boilerplate, you can use your existing script with the -0777 option: perl -0777pi -e 's/START\s+STOP/REPLACE/sg'. Adding /g is still needed if you may need to make multiple replacements within the file.

A hiccup that you might run into, although not with this regex: if the regex were START.+STOP, and a file contains multiple START/STOP pairs, greedy matching of .+ will eat everything from the first START to the last STOP. You can use non-greedy matching (match as little as possible) with .+?.

If you want to use the ^ and $ anchors for line boundaries anywhere in the string, then you also need the /m regex modifier.

158

answered Oct 22 '22 04:10

Andy

You were close. You need either -00 or -0777:

 perl -0777 -pi -e 's/START\s+/START/' input.txt

answered Oct 22 '22 02:10

tchrist

Related questions
                            
                                eclipse/tomcat: deploy doesn't work any more (ClassNotFoundException)
                            
                                Converting separate month, day and year values into a timestamp
                            
                                Can't Convert string to JsonArray
                            
                                jquery DataTables. How to get filtered (visible) rows
                            
                                jQuery Autocomplete Categories Select Label and Value
                            
                                Equivalent of "git reset --hard" with SVN
                            
                                Padding a number in NSString
                            
                                How to access the files in bin/debug within the project folder in Visual studio 2010?
                            
                                Do I need to be concerned with race conditions with asynchronous Javascript?
                            
                                Forcing Jackson to deserialize to specific primitive type
                            
                                How might I check if a particular NSString is present in an NSArray?
                            
                                What is the difference between a pointer and a reference variable in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With