Preserve whitespaces when using split() and join() in python

People also ask

How do you maintain spaces in Python?

if the file is in a fixed format then using the same number of spaces can change column widths. You could use string formatting to preserve the file format e.g., "{:4s} {:10.6f} {:10.6f} {:11.6f} {:5.2f} {:6.3f} {:6.2f}". format(*row) , where row = ["BBP1", 0.0, -0.15, 0.95*2.033, 0.0, -0.15, 1.77] .

Does string split preserve order?

Yes, . split() always preserves the order of the characters in the string.

How do you ignore whitespaces in Python?

The strip() method is the most commonly accepted method to remove whitespaces in Python. It is a Python built-in function that trims a string by removing all leading and trailing whitespaces.

You want to use re.split() in that case, with a group:

re.split(r'(\s+)', line)

would return both the columns and the whitespace so you can rejoin the line later with the same amount of whitespace included.

Example:

>>> re.split(r'(\s+)', line)
['BBP1', '   ', '0.000000', '  ', '-0.150000', '    ', '2.033000', '  ', '0.00', ' ', '-0.150', '   ', '1.77']

You probably do want to remove the newline from the end.

Other way to do this is:

s = 'BBP1   0.000000  -0.150000    2.033000  0.00 -0.150   1.77'
s.split(' ')
>>> ['BBP1', '', '', '0.000000', '', '-0.150000', '', '', '', '2.033000', '', '0.00', '-0.150', '', '', '1.77']

If we specify space character argument in split function, it creates list without eating successive space characters. So, original numbers of space characters are restored after 'join' function.

For lines that have whitespace at the beginning and/or end, a more robust pattern is (\S+) to split at non-whitespace characters:

import re

line1 = ' 4   426.2   orange\n'
line2 = '12    82.1   apple\n'

re_S = re.compile(r'(\S+)')
items1 = re_S.split(line1)
items2 = re_S.split(line2)
print(items1)  # [' ', '4', '   ', '426.2', '   ', 'orange', '\n']
print(items2)  # ['', '12', '    ', '82.1', '   ', 'apple', '\n']

These two lines have the same number of items after splitting, which is handy. The first and last items are always whitespace strings. These lines can be reconstituted using a join with a zero-length string:

print(repr(''.join(items1)))  # ' 4   426.2   orange\n'
print(repr(''.join(items2)))  # '12    82.1   apple\n'

To contrast the example with a similar pattern (\s+) (lower-case) used in the other answer here, each line splits with different result lengths and positions of the items:

re_s = re.compile(r'(\s+)')
print(re_s.split(line1))  # ['', ' ', '4', '    ', '20.0', '   ', 'orange', '\n', '']
print(re_s.split(line2))  # ['12', '    ', '82.1', '   ', 'apple', '\n', '']

As you can see, this would be a bit more difficult to process in a consistent manner.

Related questions
                            
                                Insert or delete a step in scikit-learn Pipeline
                            
                                replace part of the string in pandas data frame
                            
                                How to execute two "aggregate" functions (like sum) concurrently, feeding them from the same iterator?
                            
                                Draw a line at specific position/annotate a Facetgrid in seaborn
                            
                                Dynamically importing Python module
                            
                                How to display picture and get mouse click coordinate on it [closed]
                            
                                Python multiprocessing - How to release memory when a process is done?
                            
                                scipy, lognormal distribution - parameters
                            
                                Getting container/parent object from within python
                            
                                How can I reorder multi-indexed dataframe columns at a specific level
                            
                                Converting (YYYY-MM-DD-HH:MM:SS) date time
                            
                                Why can functions in Python print variables in enclosing scope but cannot use them in assignment?
                            
                                ggplot styles in Python
                            
                                Computing the correlation coefficient between two multi-dimensional arrays
                            
                                Writing pandas DataFrame to JSON in unicode
                            
                                How to add a variable to Python plt.title?
                            
                                Matplotlib semi-log plot: minor tick marks are gone when range is large
                            
                                Removing duplicates from Pandas rows, replace them with NaNs, shift NaNs to end of rows
                            
                                In Python, how do you find the index of the first value greater than a threshold in a sorted list?
                            
                                Proper way to handle static files and templates for Django on Heroku

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Preserve whitespaces when using split() and join() in python

Tags:

python

join

split

People also ask

Recent Activity

Donate For Us