How to make a perl one-liner "line-endings agnostic"

Question

I have scratched my head for one hour on a perl oneliner failing because the file had CRLF line endings. It has a regex with group match at the end of the line, and the CR got included in the match, making bad stuff with using the backreference for replace.

I ended up specifying the CRLF manually in the regex, but is there a way to get perl handle automatically line-ending whatever they are?

Original command is

perl -pe  's/foo bar(.*)$/foo $1 bar/g' file.txt

"Correct" command is

perl -pe  's/foo bar(.*)
/foo $1 bar
/g' file.txt

I know I can also convert line endings before processing, I'm interested in how to get Perl handle this case gracefully.

Example file (save with CRLF line endings!)

[19:06:57.033] foo barmy
[19:06:57.033] foo baryour

Expected output

[19:06:57.033] foo my bar
[19:06:57.033] foo your bar

Output with original command (bar goes at line beginning because it's matched together with carriage return):

bar:06:57.033] foo my
bar:06:57.033] foo your

ikegami · Accepted Answer

is there a way to get perl handle automatically platform-specific line-ending?

Yes. It's actually the default.

The issue is that you're trying to handle Windows line endings on a unix platform.

This will definitely do it:

perl -pe'
    BEGIN {
       binmode STDIN,  ":crlf";
       binmode STDOUT, ":crlf";
    }
    s/foo bar(.*)$/foo $1 bar/g;
' <file.txt

Might I suggest you keep doing it manually?

Alternatively, you could convert the file to a text file and convert it back.

<file.orig dos2unix | perl -pe'...' | unix2dos >file.new

Ether · Answer

In newer perls, you can use \R in your regex to strip off all end-of-line characters (it includes both and ). See perldoc perlre.

mklement0 · Answer

The \R escape sequence ^{Perl v5.10+; see perldoc rebackslash or the documentation online}, which matches "generic newlines" (platform-agnostically) can be made to work here (example uses Bash to create the multi-line input string):

$ printf 'foo barmy
foo baryour
' | perl -pe 's/foo bar(.*?)\R/foo $1 bar
/gm'
foo my bar
foo your bar

Note that the only difference to Ether's answer is use of a non-greedy construct (.*? rather than just .*), which makes all the difference here.

Read on, if you want to know more.

Background:

It is an example of a pitfall associated with \R, which stems from the fact that it can match one or two characters - either or, typically, :^[1]

With the greedy (.*) construct , "my " - including the - is captured, because the regex engine apparently only backtracks by one character to look for \R, which the remaining by itself also satisfies.

By contrast, using the non-greedy (.*?) construct causes \R to match the sequence, as intended.

^{[1] \R matches MORE than just
and
: it matches any single character that is classified as vertical whitespace in Unicode terms, which also includes \v (vertical tab), \f (form feed),
(by itself), and the following Unicode chars: 0x133 (NEXT LINE), 0x2028 (LINE SEPARATOR), 0x8232 (LINE SEPARATOR) and 0x8233 (PARAGRAPH SEPARATOR)}

devnull · Answer

You can say:

perl -pe 's/foo bar([^\015]*)(\015?\012)/foo $1 bar$2/g' *.txt

The line endings would be preserved, i.e. would be the same as the input file.

You might also want to refer to perldoc perlport.

How to make a perl one-liner "line-endings agnostic"

Tags:

regex

newline

perl

CharlesB

4 Answers

ikegami

Ether

mklement0

devnull

Recent Activity

Donate For Us

How to make a perl one-liner "line-endings agnostic"

Tags:

regex

newline

perl

CharlesB

4 Answers

ikegami

Ether

mklement0

devnull

Related questions

Recent Activity

Donate For Us