Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to remove line breaks

I am a complete newbie to Python, and I'm stuck with a regex problem. I'm trying to remove the line break character at the end of each line in a text file, but only if it follows a lowercase letter, i.e. [a-z]. If the end of the line ends in a lower case letter, I want to replace the line break/newline character with a space.

This is what I've got so far:

import re
import sys

textout = open("output.txt","w")
textblock = open(sys.argv[1]).read()
textout.write(re.sub("[a-z]\z","[a-z] ", textblock, re.MULTILINE) )
textout.close()
like image 769
Jean77 Avatar asked Feb 22 '11 07:02

Jean77


People also ask

How do you remove line breaks from a string?

Use the String. replace() method to remove all line breaks from a string, e.g. str. replace(/[\r\n]/gm, ''); . The replace() method will remove all line breaks from the string by replacing them with an empty string.

How do you match line breaks in RegEx?

Line breaks In pattern matching, the symbols “^” and “$” match the beginning and end of the full file, not the beginning and end of a line. If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”.

Does trim remove line breaks?

trim method removes any line breaks from the start and end of a string. It handles all line terminator characters (LF, CR, etc). The method also removes any leading or trailing spaces or tabs. The trim() method doesn't change the original string, it returns a new string.

What is \r and \n in RegEx?

Matches a form-feed character. \n. Matches a newline character. \r. Matches a carriage return character.


1 Answers

Try

re.sub(r"(?<=[a-z])\r?\n"," ", textblock)

\Z only matches at the end of the string, after the last linebreak, so it's definitely not what you need here. \z is not recognized by the Python regex engine.

(?<=[a-z]) is a positive lookbehind assertion that checks if the character before the current position is a lowercase ASCII character. Only then the regex engine will try to match a line break.

Also, always use raw strings with regexes. Makes backslashes easier to handle.

like image 131
Tim Pietzcker Avatar answered Oct 05 '22 11:10

Tim Pietzcker