find and replace double newlines with perl?

Tags:

I'm cleaning up some web pages that for some reason have about 8 line breaks between tags. I wanted to remove most of them, and I tried this

perl -pi -w -e "s/\n\n//g" *.html

But no luck. For good measure, I tried

perl -pi -w -e "s/\n//g" *.html

and it did remove all my line breaks. What am I doing wrong?

edit I also tried \r\n\r\n, same deal. Works as a single line breaks, doesn't do anything for two consecutive ones.

926

asked Aug 21 '10 01:08

user151841

2 Answers

Use -0:

perl -pi -0 -w -e "s/\n\n//g" *.html

The problem is that by default -p reads the file one line at a time. There's no such thing as a line with two newlines, so you didn't find any. The -0 changes the line-ending character to "\0", which probably doesn't exist in your file, so it processes the whole file at once. (Even if the file did contain NULs, you're looking for consecutive newlines, so processing it in NUL-delimited chunks won't be a problem.)

You probably want to adjust your regex as well, but it's hard to be sure exactly what you want. Try s/\n\n+/\n/g, which will replace any number of consecutive newlines with a single newline.

If the file is very large, you may not have enough memory to load it in a single chunk. A workaround for this is to pick some character that is common enough to split the file into manageable chunks, and tell Perl to use that as the line-ending character. But it also has to be a character that will not appear inside the matches you're trying to replace. For example, -0x2e will split the file on "." (ASCII 0x2E).

answered Nov 15 '22 00:11

cjm

I was trying to replace a double newline with a single using the above recommendation on a large file (2.3G) With huge files, it will seg fault when trying to read the entire file at once. So instead of looking for a double newline, just look for lines where the only char is a newline:

perl -pi -w -e 's/^\n$//' file.txt

answered Nov 15 '22 00:11

Ian

Related questions
                            
                                What's wrong with this Regular Expression?
                            
                                Python Regex that adds space after dot
                            
                                Regex to match comma followed by whitespace?
                            
                                How to split strings using regular expressions
                            
                                Split URL on final forward slash
                            
                                Is there a regular expression to find one of several words in Notepad++
                            
                                Is there a better way to create acronym from upper letters in C#?
                            
                                Convert a Sentence to InitCap / camel Case / Proper Case
                            
                                Regular Expression for Pakistan's mobile number [closed]
                            
                                Why does removal of empty lines from multiline string in PowerShell fail using Replace function?
                            
                                R regular expression: isolate a string between quotes
                            
                                Python Upper and Lowercase Criteria in String Regex
                            
                                Python format phone number
                            
                                Need help with Regular Expression to Match Blood Group
                            
                                PHP: Escaping RegEx-reserved characters - anyone know what's wrong with this?
                            
                                python regular expression across multiple lines
                            
                                Matching everything between html <body> tags using PHP
                            
                                Remove numeric prefix from string - PHP regex
                            
                                Java Counting # of occurrences of a word in a string
                            
                                Regex to find commas that aren't inside "( and )"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

find and replace double newlines with perl?

Tags:

string

regex

perl

user151841

People also ask

2 Answers

cjm

Ian

Recent Activity

Donate For Us