When using split with one capturing group in the pattern with an alternative, Python returns unexpected, unmatched values.
For example, the code below is supposed to return either "a" or a number. It does exactly that when you use it with findall however split returns non matches and empty strings.
x = re.compile(r'(a|-?[0-9]+)')
# returns ['45', '444', '19', 'a']
print(x.findall("45, 444 < 19, abc"))
# returns ['', '45', ', ', '444', ' < ', '19', ', ', 'a', 'bc']
print(x.split("45, 444 < 19, abc"))
The expected results are what findall does. I don't understand why split behaves differently.
Edit: Also when you don't use a capturing group, findall still works but split gets worse by not returning the matched results too.
You can slightly change your regex expression in re.split:
import re
print(list(filter(None, re.split('[^\da]+', "45, 444 < 19, abc"))))
Output:
['45', '444', '19', 'a']
The docs of re.split state that if you use a capturing group then those are also returned. So you split on your expression AND return the capturing groups.
Your pattern (a|-?[0-9]+) captures in a group:
45
444
19
a
What is unmatched is (and so returned)
,
<
,
Resulting in:
['', '45', ', ', '444', ' < ', '19', ', ', 'a', 'bc']
The first entry is empty due to the split on [0-9]+ because it splits that from the start of the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With