Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does `perl -pe 's/$/\n/g'` add 2 blank lines?

Tags:

regex

perl

I'm working through the one liner book and came across

perl -pe 's/$/\n/' file

which inserts a blank line after each line by setting the end of the line to new line thus adding a new line to the existing newline resulting in a blank line. As this is the first example without g at the end of the pattern, I tried

perl -pe 's/$/\n/g' file

this results in 2 blank lines between lines.
I would have expected no difference since there is only one $ per line so replacing all of them should be the same as replacing just the first one.
What's going on here?

like image 664
peer Avatar asked Feb 08 '18 12:02

peer


People also ask

Why are blank lines sometimes put in the code?

Blank lines improve readability by setting off sections of code that are logically related. Two blank lines should always be used in the following circumstances: Between sections of a source file. Between class and interface definitions.

How do I skip blank lines in Perl?

To skip over blanks lines in a perl script, you have several choices. You could use a "next if /^$/" (skip if empty) command or a "next if /^\s*$/" skip if empty or only white space.


2 Answers

/$/ matches the “end of string”. This might be

  • the end of string (like /\z/),
  • or just before a newline before the end of string (like /(?=\n\z)/).

(Additionally, /$/m matches the “end of line”. This might be

  • the end of string,
  • or just before a newline (like /(?=\n)/).

).

With your substitution /$/\n/g, the regex matches twice: once before the newline, then again at the end of string:

  • The first match is before the newline:

    "foo\n"
    #   ^ match
    

    A newline is placed before the current match end:

    "foo\n\n"
    #     ^ insert before
    
  • The next match is at the end of string:

    "foo\n\n"
    #       ^ match
    

    A newline is inserted before the current match end:

     "foo\n\n\n"
     #         ^ insert before
    
  • No further match is found.

The solution: if $ is to DWIMmy for you, always match \z or \n explicitly, possibly together with lookaheads like (?=\n). Consider matching all Unicode line separators \R instead of just \n.

like image 192
amon Avatar answered Nov 05 '22 07:11

amon


This isn't a sound understanding of the situation. $ is a badly-defined and unintuitive metacharacter

  • It is a zero-width match

  • It will match before a newline character at the end of the bound string

  • It will match at the end of the bound string

  • With the /m modifier in place, it will also match before any newline character anywhere, but not immediately after it unless it is the last character of the string

\z is much more useful: it only ever matches at the end of the string

"by setting the end of the line to new line"

Mentioning "lines" at all is misleading, and you should be careful to explain in comments what meaning you're applying. If you have

my $s = "xxx\n"

then

say pos($s) while $s =~ /$/g

will produce

3
4

i.e. both before and after the newline, because it happens to be at the end of the string

This is also why your s/$/\n/g adds two newlines: there are two zero-width matches for /$/ within this string, and a global substitution finds them and replaces them both with a newline, resulting in three newlines instead of the original one

It's unclear what you intended

  • Adding a newline to the end of a string, regardless of what's there already is s/\z/\n/ or just $s .= "\n"

  • If you want to ensure that, say, there are exactly two newlines at the end of a string, then just remove any existing linefeeds first with s/\n+\z/\b\n/

As you can see, \z is much more useful than $

And don't forget \R if you're dealing with cross-platform data. It will match any standard line terminator: any of CR, LF or CRLF

If this still leaves you with a problem then please ask again. I was going to write about zero-width matches but it's hard to know whether my answer is clear without it

like image 20
Borodin Avatar answered Nov 05 '22 08:11

Borodin