I'm trying to split a string like below on spaces:
string = "This            is a                      test."
# desired output
# ['This', '            ', 'is', ' ', 'a', '                      ', 'test.']
# actual output, which does make sense
result = string.split()
# ['This', 'is', 'a', 'test.']
There's also re.split which keeps the delimiter, but not in the way I hoped:
import re
string = "This            is a                      test."
result = re.split(r"( )", string)
# ['This',
# ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ',
# 'is', ' ',
# 'a', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ',
# 'test.']
I can do something like this and achieve the result I want:
string = "This            is a                      test."
result = []
spaces = ''
word = ''
for letter in string:
    if letter == ' ':
        spaces += ' ' 
        if word:
            result.append(word)
            word = ''
    else:
        word += letter
        if spaces:
            result.append(spaces)
            spaces = ''
if spaces:
    result.append(spaces)
if word:
    result.append(word)
print(result)
# ['This', '            ', 'is', ' ', 'a', '                      ', 'test.']
But this doesn't feel like the best way to do it. Is there a more Pythonic way of achieving this?
Try with re.split with the expression of (\s+):
>>> import re
>>> string = "This            is a                      test."
>>> re.split(r'(\s+)', string)
['This', '            ', 'is', ' ', 'a', '                      ', 'test.']
>>> 
Regex101 example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With