Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split returns non-matches [duplicate]

Tags:

python

regex

When using split with one capturing group in the pattern with an alternative, Python returns unexpected, unmatched values.

For example, the code below is supposed to return either "a" or a number. It does exactly that when you use it with findall however split returns non matches and empty strings.

x = re.compile(r'(a|-?[0-9]+)')

# returns ['45', '444', '19', 'a']
print(x.findall("45, 444 < 19, abc"))

# returns ['', '45', ', ', '444', ' < ', '19', ', ', 'a', 'bc']
print(x.split("45, 444 < 19, abc"))

The expected results are what findall does. I don't understand why split behaves differently.

Edit: Also when you don't use a capturing group, findall still works but split gets worse by not returning the matched results too.

like image 621
Eren Kara Avatar asked Apr 24 '26 02:04

Eren Kara


2 Answers

You can slightly change your regex expression in re.split:

import re
print(list(filter(None, re.split('[^\da]+', "45, 444 < 19, abc"))))

Output:

['45', '444', '19', 'a']
like image 103
Ajax1234 Avatar answered Apr 25 '26 15:04

Ajax1234


The docs of re.split state that if you use a capturing group then those are also returned. So you split on your expression AND return the capturing groups.

Your pattern (a|-?[0-9]+) captures in a group:

45
444
19
a

What is unmatched is (and so returned)

, 
 < 
 , 

Resulting in:

['', '45', ', ', '444', ' < ', '19', ', ', 'a', 'bc']

The first entry is empty due to the split on [0-9]+ because it splits that from the start of the string.

like image 26
The fourth bird Avatar answered Apr 25 '26 16:04

The fourth bird