Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a string-collapse library function in python?

Is there a cross-platform library function that would collapse a multiline string into a single-line string with no repeating spaces?

I've come up with some snip below, but I wonder if there is a standard function which I could just import which is perhaps even optimized in C?

def collapse(input):
    import re
    rn = re.compile(r'(\r\n)+')
    r = re.compile(r'\r+')
    n = re.compile(r'\n+')
    s = re.compile(r'\ +')
    return s.sub(' ',n.sub(' ',r.sub(' ',rn.sub(' ',input))))

P.S. Thanks for good observations. ' '.join(input.split()) seems to be the winner as it actually runs faster about twice in my case compared to search-replace with a precompiled r'\s+' regex.

like image 571
Evgeny Avatar asked Dec 06 '22 05:12

Evgeny


2 Answers

The built-in string.split() method will split on runs of whitespace, so you can use that and then join the resulting list using spaces, like this:

' '.join(my_string.split())

Here's a complete test script:

TEST = """This
is        a test\twith a
  mix of\ttabs,     newlines and repeating
whitespace"""

print ' '.join(TEST.split())
# Prints:
# This is a test with a mix of tabs, newlines and repeating whitespace
like image 156
RichieHindle Avatar answered Dec 07 '22 17:12

RichieHindle


You had the right idea, you just needed to read the python manual a little more closely:

import re
somewhitespace = re.compile(r'\s+')
TEST = """This
is        a test\twith a
  mix of\ttabs,     newlines and repeating
whitespace"""

somewhitespace.sub(' ', TEST)

'This is a test with a mix of tabs, newlines and repeating whitespace'
like image 29
Unknown Avatar answered Dec 07 '22 17:12

Unknown