I'm working through the one liner book and came across
perl -pe 's/$/\n/' file
which inserts a blank line after each line by setting the end of the line to new line thus adding a new line to the existing newline resulting in a blank line.
As this is the first example without g
at the end of the pattern, I tried
perl -pe 's/$/\n/g' file
this results in 2 blank lines between lines.
I would have expected no difference since there is only one $
per line so replacing all of them should be the same as replacing just the first one.
What's going on here?
Blank lines improve readability by setting off sections of code that are logically related. Two blank lines should always be used in the following circumstances: Between sections of a source file. Between class and interface definitions.
To skip over blanks lines in a perl script, you have several choices. You could use a "next if /^$/" (skip if empty) command or a "next if /^\s*$/" skip if empty or only white space.
/$/
matches the “end of string”. This might be
/\z/
),/(?=\n\z)/
).(Additionally, /$/m
matches the “end of line”. This might be
/(?=\n)/
).).
With your substitution /$/\n/g
, the regex matches twice: once before the newline, then again at the end of string:
The first match is before the newline:
"foo\n"
# ^ match
A newline is placed before the current match end:
"foo\n\n"
# ^ insert before
The next match is at the end of string:
"foo\n\n"
# ^ match
A newline is inserted before the current match end:
"foo\n\n\n"
# ^ insert before
No further match is found.
The solution: if $
is to DWIMmy for you, always match \z
or \n
explicitly, possibly together with lookaheads like (?=\n)
. Consider matching all Unicode line separators \R
instead of just \n
.
This isn't a sound understanding of the situation. $
is a badly-defined and unintuitive metacharacter
It is a zero-width match
It will match before a newline character at the end of the bound string
It will match at the end of the bound string
With the /m
modifier in place, it will also match before any newline character anywhere, but not immediately after it unless it is the last character of the string
\z
is much more useful: it only ever matches at the end of the string
"by setting the end of the line to new line"
Mentioning "lines" at all is misleading, and you should be careful to explain in comments what meaning you're applying. If you have
my $s = "xxx\n"
then
say pos($s) while $s =~ /$/g
will produce
3
4
i.e. both before and after the newline, because it happens to be at the end of the string
This is also why your s/$/\n/g
adds two newlines: there are two zero-width matches for /$/
within this string, and a global substitution finds them and replaces them both with a newline, resulting in three newlines instead of the original one
It's unclear what you intended
Adding a newline to the end of a string, regardless of what's there already is s/\z/\n/
or just $s .= "\n"
If you want to ensure that, say, there are exactly two newlines at the end of a string, then just remove any existing linefeeds first with s/\n+\z/\b\n/
As you can see, \z
is much more useful than $
And don't forget \R
if you're dealing with cross-platform data. It will match any standard line terminator: any of CR, LF or CRLF
If this still leaves you with a problem then please ask again. I was going to write about zero-width matches but it's hard to know whether my answer is clear without it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With