if the file is in a fixed format then using the same number of spaces can change column widths. You could use string formatting to preserve the file format e.g., "{:4s} {:10.6f} {:10.6f} {:11.6f} {:5.2f} {:6.3f} {:6.2f}". format(*row) , where row = ["BBP1", 0.0, -0.15, 0.95*2.033, 0.0, -0.15, 1.77] .
Yes, . split() always preserves the order of the characters in the string.
The strip() method is the most commonly accepted method to remove whitespaces in Python. It is a Python built-in function that trims a string by removing all leading and trailing whitespaces.
You want to use re.split()
in that case, with a group:
re.split(r'(\s+)', line)
would return both the columns and the whitespace so you can rejoin the line later with the same amount of whitespace included.
Example:
>>> re.split(r'(\s+)', line)
['BBP1', ' ', '0.000000', ' ', '-0.150000', ' ', '2.033000', ' ', '0.00', ' ', '-0.150', ' ', '1.77']
You probably do want to remove the newline from the end.
Other way to do this is:
s = 'BBP1 0.000000 -0.150000 2.033000 0.00 -0.150 1.77'
s.split(' ')
>>> ['BBP1', '', '', '0.000000', '', '-0.150000', '', '', '', '2.033000', '', '0.00', '-0.150', '', '', '1.77']
If we specify space character argument in split function, it creates list without eating successive space characters. So, original numbers of space characters are restored after 'join' function.
For lines that have whitespace at the beginning and/or end, a more robust pattern is (\S+)
to split at non-whitespace characters:
import re
line1 = ' 4 426.2 orange\n'
line2 = '12 82.1 apple\n'
re_S = re.compile(r'(\S+)')
items1 = re_S.split(line1)
items2 = re_S.split(line2)
print(items1) # [' ', '4', ' ', '426.2', ' ', 'orange', '\n']
print(items2) # ['', '12', ' ', '82.1', ' ', 'apple', '\n']
These two lines have the same number of items after splitting, which is handy. The first and last items are always whitespace strings. These lines can be reconstituted using a join with a zero-length string:
print(repr(''.join(items1))) # ' 4 426.2 orange\n'
print(repr(''.join(items2))) # '12 82.1 apple\n'
To contrast the example with a similar pattern (\s+)
(lower-case) used in the other answer here, each line splits with different result lengths and positions of the items:
re_s = re.compile(r'(\s+)')
print(re_s.split(line1)) # ['', ' ', '4', ' ', '20.0', ' ', 'orange', '\n', '']
print(re_s.split(line2)) # ['12', ' ', '82.1', ' ', 'apple', '\n', '']
As you can see, this would be a bit more difficult to process in a consistent manner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With