I am trying to split the string below on a number of delimiters including \n, comma(,), and colon(:) except when the colon is part of a time value. Below is my string:
values = 'City:hell\nCountry:rome\nUpdate date: 2022-09-26 00:00:00'
I have tried:
result = re.split(':|,|\n', values)
However, this ends up splitting the time resulting in `
['City','hell','Country','rome','Update date',' 2022-09-26 00','00','00']
Whereas the expected outcome is
['City','hell','Country','rome','Update date', '2022-09-26 00:00:00']
Any help/assistance will be appreciated
You could use look-behind to ensure that what is before : is not a pair of digits
re.split('(?<![0-9]{2}):\s*|,|\n', values)
It separates by
,
\n
So :
is a separator (when not preceded by a pair of digits). But so is :
or :
(still, when they are not preceded by a pair of digits). Consequence is that if, as it is the case if your string, there is a space after a colon, then that space is not included in the next field (since it is part of the separator, not of a field)
Or, you could also keep the first version of my answer (without \s*
) and just .strip()
the fields.
Solution without re
:
values = "City:hell\nCountry:rome\nUpdate date: 2022-09-26 00:00:00"
out = [
v.strip()
for l in (line.split(":", maxsplit=1) for line in values.splitlines())
for v in l
]
print(out)
Prints:
['City', 'hell', 'Country', 'rome', 'Update date', '2022-09-26 00:00:00']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With