Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preserve whitespaces when using split() and join() in python

Tags:

python

join

split

People also ask

How do you maintain spaces in Python?

if the file is in a fixed format then using the same number of spaces can change column widths. You could use string formatting to preserve the file format e.g., "{:4s} {:10.6f} {:10.6f} {:11.6f} {:5.2f} {:6.3f} {:6.2f}". format(*row) , where row = ["BBP1", 0.0, -0.15, 0.95*2.033, 0.0, -0.15, 1.77] .

Does string split preserve order?

Yes, . split() always preserves the order of the characters in the string.

How do you ignore whitespaces in Python?

The strip() method is the most commonly accepted method to remove whitespaces in Python. It is a Python built-in function that trims a string by removing all leading and trailing whitespaces.


You want to use re.split() in that case, with a group:

re.split(r'(\s+)', line)

would return both the columns and the whitespace so you can rejoin the line later with the same amount of whitespace included.

Example:

>>> re.split(r'(\s+)', line)
['BBP1', '   ', '0.000000', '  ', '-0.150000', '    ', '2.033000', '  ', '0.00', ' ', '-0.150', '   ', '1.77']

You probably do want to remove the newline from the end.


Other way to do this is:

s = 'BBP1   0.000000  -0.150000    2.033000  0.00 -0.150   1.77'
s.split(' ')
>>> ['BBP1', '', '', '0.000000', '', '-0.150000', '', '', '', '2.033000', '', '0.00', '-0.150', '', '', '1.77']

If we specify space character argument in split function, it creates list without eating successive space characters. So, original numbers of space characters are restored after 'join' function.


For lines that have whitespace at the beginning and/or end, a more robust pattern is (\S+) to split at non-whitespace characters:

import re

line1 = ' 4   426.2   orange\n'
line2 = '12    82.1   apple\n'

re_S = re.compile(r'(\S+)')
items1 = re_S.split(line1)
items2 = re_S.split(line2)
print(items1)  # [' ', '4', '   ', '426.2', '   ', 'orange', '\n']
print(items2)  # ['', '12', '    ', '82.1', '   ', 'apple', '\n']

These two lines have the same number of items after splitting, which is handy. The first and last items are always whitespace strings. These lines can be reconstituted using a join with a zero-length string:

print(repr(''.join(items1)))  # ' 4   426.2   orange\n'
print(repr(''.join(items2)))  # '12    82.1   apple\n'

To contrast the example with a similar pattern (\s+) (lower-case) used in the other answer here, each line splits with different result lengths and positions of the items:

re_s = re.compile(r'(\s+)')
print(re_s.split(line1))  # ['', ' ', '4', '    ', '20.0', '   ', 'orange', '\n', '']
print(re_s.split(line2))  # ['12', '    ', '82.1', '   ', 'apple', '\n', '']

As you can see, this would be a bit more difficult to process in a consistent manner.