I have a string of text that looks like this:
' 19,301 14,856 18,554'
Where is a space.
I'm trying to split it on the white space, but I need to retain all of the white space as an item in the new list. Like this:
[' ', '19,301',' ', '14,856', ' ', '18,554']
I have been using the following code:
re.split(r'( +)(?=[0-9])', item)
and it returns:
['', ' ', '19,301', ' ', '14,856', ' ', '18,554']
Notice that it always adds an empty element to the beginning of my list. It's easy enough to drop it, but I'm really looking to understand what is going on here, so I can get the code to treat things consistently. Thanks.
When using the re.split
method, if the capture group is matched at the start of a string, the "result will start with an empty string". The reason for this is so that join
method can behave as the inverse of the split
method.
It might not make a lot of sense for your case, where the separator matches are of varying sizes, but if you think about the case where the separators were a |
character and you wanted to perform a join on them, with the extra empty string it would work:
>> item = '|19,301|14,856|18,554'
>> items = re.split(r'\|', item)
>> print items
['', '19,301', '14,856', '18,554']
>> '|'.join(items)
'|19,301|14,856|18,554'
But without it, the initial pipe would be missing:
>> items = ['19,301', '14,856', '18,554']
>> '|'.join(items)
'19,301|14,856|18,554'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With