Is there a cross-platform library function that would collapse a multiline string into a single-line string with no repeating spaces?
I've come up with some snip below, but I wonder if there is a standard function which I could just import which is perhaps even optimized in C?
def collapse(input):
import re
rn = re.compile(r'(\r\n)+')
r = re.compile(r'\r+')
n = re.compile(r'\n+')
s = re.compile(r'\ +')
return s.sub(' ',n.sub(' ',r.sub(' ',rn.sub(' ',input))))
P.S. Thanks for good observations. ' '.join(input.split())
seems to be the winner as it actually runs faster about twice in my case compared to search-replace with a precompiled r'\s+'
regex.
The built-in string.split()
method will split on runs of whitespace, so you can use that and then join the resulting list using spaces, like this:
' '.join(my_string.split())
Here's a complete test script:
TEST = """This
is a test\twith a
mix of\ttabs, newlines and repeating
whitespace"""
print ' '.join(TEST.split())
# Prints:
# This is a test with a mix of tabs, newlines and repeating whitespace
You had the right idea, you just needed to read the python manual a little more closely:
import re
somewhitespace = re.compile(r'\s+')
TEST = """This
is a test\twith a
mix of\ttabs, newlines and repeating
whitespace"""
somewhitespace.sub(' ', TEST)
'This is a test with a mix of tabs, newlines and repeating whitespace'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With