I'm working through the one liner book and came across <pre class="prettyprint"><code>perl -pe 's/$/\n/' file </code></pre> which inserts a blank line after each line by setting the end of the line to new line thus adding a new line to the existing newline resulting in a blank line. As this is the first example without <code>g</code> at the end of the pattern, I tried <pre class="prettyprint"><code>perl -pe 's/$/\n/g' file </code></pre> this results in 2 blank lines between lines. I would have expected no difference since there is only one <code>$</code> per line so replacing all of them should be the same as replacing just the first one. What's going on here?

<code>/$/</code> matches the “end of string”. This might be <ul> <li>the end of string (like <code>/\z/</code>),</li> <li>or just before a newline before the end of string (like <code>/(?=\n\z)/</code>).</li> </ul> (Additionally, <code>/$/m</code> matches the “end of line”. This might be <ul> <li>the end of string,</li> <li>or just before a newline (like <code>/(?=\n)/</code>).</li> </ul> ). With your substitution <code>/$/\n/g</code>, the regex matches twice: once before the newline, then again at the end of string: <ul> <li> The first match is before the newline: <pre class="prettyprint"><code>"foo\n" # ^ match </code></pre> A newline is placed before the current match end: <pre class="prettyprint"><code>"foo\n\n" # ^ insert before </code></pre> </li> <li> The next match is at the end of string: <pre class="prettyprint"><code>"foo\n\n" # ^ match </code></pre> A newline is inserted before the current match end: <pre class="prettyprint"><code> "foo\n\n\n" # ^ insert before </code></pre> </li> <li>No further match is found.</li> </ul> The solution: if <code>$</code> is to DWIMmy for you, always match <code>\z</code> or <code>\n</code> explicitly, possibly together with lookaheads like <code>(?=\n)</code>. Consider matching all Unicode line separators <code>\R</code> instead of just <code>\n</code>.

Why does `perl -pe 's/$/\n/g'` add 2 blank lines?

Tags:

regex

perl

I'm working through the one liner book and came across

Click to copy

perl -pe 's/$/\n/' file

which inserts a blank line after each line by setting the end of the line to new line thus adding a new line to the existing newline resulting in a blank line. As this is the first example without g at the end of the pattern, I tried

Click to copy

perl -pe 's/$/\n/g' file

this results in 2 blank lines between lines.
I would have expected no difference since there is only one $ per line so replacing all of them should be the same as replacing just the first one.
What's going on here?

664

asked Feb 08 '18 12:02

peer

2 Answers

/$/ matches the “end of string”. This might be

the end of string (like /\z/),
or just before a newline before the end of string (like /(?=\n\z)/).

(Additionally, /$/m matches the “end of line”. This might be

the end of string,
or just before a newline (like /(?=\n)/).

With your substitution /$/\n/g, the regex matches twice: once before the newline, then again at the end of string:

The first match is before the newline:

Click to copy
```
"foo\n"
#   ^ match
```
A newline is placed before the current match end:

Click to copy
```
"foo\n\n"
#     ^ insert before
```
The next match is at the end of string:

Click to copy
```
"foo\n\n"
#       ^ match
```
A newline is inserted before the current match end:

Click to copy
```
 "foo\n\n\n"
 #         ^ insert before
```
No further match is found.

The solution: if $ is to DWIMmy for you, always match \z or \n explicitly, possibly together with lookaheads like (?=\n). Consider matching all Unicode line separators \R instead of just \n.

192

answered Nov 05 '22 07:11

amon

This isn't a sound understanding of the situation. $ is a badly-defined and unintuitive metacharacter

It is a zero-width match
It will match before a newline character at the end of the bound string
It will match at the end of the bound string
With the /m modifier in place, it will also match before any newline character anywhere, but not immediately after it unless it is the last character of the string

\z is much more useful: it only ever matches at the end of the string

"by setting the end of the line to new line"

Mentioning "lines" at all is misleading, and you should be careful to explain in comments what meaning you're applying. If you have

Click to copy

my $s = "xxx\n"

then

Click to copy

say pos($s) while $s =~ /$/g

will produce

Click to copy

3
4

i.e. both before and after the newline, because it happens to be at the end of the string

This is also why your s/$/\n/g adds two newlines: there are two zero-width matches for /$/ within this string, and a global substitution finds them and replaces them both with a newline, resulting in three newlines instead of the original one

It's unclear what you intended

Adding a newline to the end of a string, regardless of what's there already is s/\z/\n/ or just $s .= "\n"
If you want to ensure that, say, there are exactly two newlines at the end of a string, then just remove any existing linefeeds first with s/\n+\z/\b\n/

As you can see, \z is much more useful than $

And don't forget \R if you're dealing with cross-platform data. It will match any standard line terminator: any of CR, LF or CRLF

If this still leaves you with a problem then please ask again. I was going to write about zero-width matches but it's hard to know whether my answer is clear without it

answered Nov 05 '22 08:11

Borodin

Related questions
                            
                                How to use different separators (/ , |) in a regular expression
                            
                                Java Regex Word Boundaries
                            
                                how to match specific combination of 2 letters with regex
                            
                                AWK sub function syntax
                            
                                R: regular expression to specify end of string char is a letter
                            
                                Java regex enclose words in brackets
                            
                                Sed replace asterisk symbols
                            
                                RegEx for ISIN with at least 1 number
                            
                                Python regex, match group span (start and end)
                            
                                How to use regex for jasmine matchers
                            
                                How to interpret this regular expression /[\W_]/g
                            
                                How can I normalize / asciify Unicode characters in Google Sheets?
                            
                                Match exactly one of each from set of characters
                            
                                Validate phone number with Symfony
                            
                                Is it faster to use alternation than subsequent replacements in regular expressions
                            
                                Replace All Occurrences using Oracle SQL regexp_replace Case-insensitive
                            
                                Regex backreferences in Java
                            
                                Extract Python dictionary from string
                            
                                TypeError: sequence item 1: expected a bytes-like object, str found
                            
                                nginx deny access to .log file extension

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does `perl -pe 's/$/\n/g'` add 2 blank lines?

Tags:

regex

perl

peer

People also ask

2 Answers

amon

Borodin

Recent Activity

Donate For Us