I would like to do something like:
temp=a.split()
#do some stuff with this new list
b=" ".join(temp)
where a is the original string, and b is after it has been modified. The problem is that when performing such methods, the newlines are removed from the new string. So how can I do this without removing newlines?
I assume in your third line you mean join(temp), not join(a).
To split and yet keep the exact "splitters", you need the re.split function (or split method of RE objects) with a capturing group:
>>> import re
>>> f='tanto va\nla gatta al lardo'
>>> re.split(r'(\s+)', f)
['tanto', ' ', 'va', '\n', 'la', ' ', 'gatta', ' ', 'al', ' ', 'lardo']
The pieces you'd get from just re.split are at index 0, 2, 4, ... while the odd indices have the "separators" -- the exact sequences of whitespace that you'll use to re-join the list at the end (with ''.join) to get the same whitespace the original string had.
You can either work directly on the even-spaced items, or you can first extract them:
>>> x = re.split(r'(\s+)', f)
>>> y = x[::2]
>>> y
['tanto', 'va', 'la', 'gatta', 'al', 'lardo']
then alter y as you will, e.g.:
>>> y[:] = [z+z for z in y]
>>> y
['tantotanto', 'vava', 'lala', 'gattagatta', 'alal', 'lardolardo']
then reinsert and join up:
>>> x[::2] = y
>>> ''.join(x)
'tantotanto vava\nlala gattagatta alal lardolardo'
Note that the \n is exactly in the position equivalent to where it was in the original, as desired.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With