Regular expression to remove line breaks

Tags:

I am a complete newbie to Python, and I'm stuck with a regex problem. I'm trying to remove the line break character at the end of each line in a text file, but only if it follows a lowercase letter, i.e. [a-z]. If the end of the line ends in a lower case letter, I want to replace the line break/newline character with a space.

This is what I've got so far:

import re
import sys

textout = open("output.txt","w")
textblock = open(sys.argv[1]).read()
textout.write(re.sub("[a-z]\z","[a-z] ", textblock, re.MULTILINE) )
textout.close()

769

asked Feb 22 '11 07:02

Jean77

1 Answers

Try

re.sub(r"(?<=[a-z])\r?\n"," ", textblock)

\Z only matches at the end of the string, after the last linebreak, so it's definitely not what you need here. \z is not recognized by the Python regex engine.

(?<=[a-z]) is a positive lookbehind assertion that checks if the character before the current position is a lowercase ASCII character. Only then the regex engine will try to match a line break.

Also, always use raw strings with regexes. Makes backslashes easier to handle.

131

answered Oct 05 '22 11:10

Tim Pietzcker

Related questions
                            
                                What's the usage of a virtual subclass?
                            
                                How to save a trained model by scikit-learn? [duplicate]
                            
                                Pandas: How do I return a row value once a column reaches a certain value of another column?
                            
                                Can pyautogui be used to prevent windows screen lock?
                            
                                Understanding accumulated gradients in PyTorch
                            
                                How to transpose the contents of lines and columns in a file in Vim?
                            
                                python classes that refer to each other
                            
                                os.path.basename works with URLs, why?
                            
                                Python inheritance and calling parent class constructor
                            
                                Block requests from *.appspot.com and force custom domain in Google App Engine
                            
                                Why do I have to press Ctrl+D twice to close stdin?
                            
                                Python returning the wrong length of string when using special characters
                            
                                Cannot bind to address after socket program crashes
                            
                                Convert IP address string to binary in Python
                            
                                How to convert an HTML table to an array in python
                            
                                Python print statements being buffered with > output redirection
                            
                                Ubuntu packages needed to compile Python 2.7
                            
                                Why can't python infer types like scala? [duplicate]
                            
                                Way to use ast.literal_eval() to convert string into a datetime?
                            
                                Converting lists of tuples to strings Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regular expression to remove line breaks

Tags:

python

regex

python-2.7

Jean77

People also ask

1 Answers

Tim Pietzcker

Recent Activity

Donate For Us