Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex: capturing group within OR

Tags:

python

regex

I'm using python and the re module to parse some strings and extract a 4 digits code associated with a prefix. Here are 2 examples of strings I would have to parse:

str1 = "random stuff tokenA1234 more stuff"
str2 = "whatever here tokenB5678 tokenA0123 and more there"

tokenA and tokenB are the prefixes and 1234, 5678, 0123 are the digits I need to grab. token A and B are just an example here. The prefix can be something like an address http://domain.com/ (tokenA) or a string like Id: ('[Ii]d:?\s?') (tokenB).

My regex looks like:

re.findall('.*?(?:tokenA([0-9]{4})|tokenB([0-9]{4})).*?', str1)

When parsing the 2 strings above, I get:

[('1234','')]
[('','5678'),('0123','')]

And I'd like to simply get ['1234'] or ['5678','0123'] instead of a tuple. How can I modify the regex to achieve that? Thanks in advance.

like image 434
jbdev Avatar asked Nov 26 '25 13:11

jbdev


2 Answers

You get tuples as a result since you have more than 1 capturing group in your regex. See re.findall reference:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

So, the solution is to use only one capturing group.

Since you have tokens in your regex, you can use them inside a group. Since only tokens differ, ([0-9]{4}) part is common for both, just use an alternation operator between tokens put into a non-capturing group:

(?:tokenA|tokenB)([0-9]{4})
^^^^^^^^^^^^^^^^^

The regex means:

  • (?:tokenA|tokenB) - match but not capture tokenA or tokenB
  • ([0-9]{4}) - match and capture into Group 1 four digits

IDEONE demo:

import re
s = "tokenA1234tokenB34567"
print(re.findall(r'(?:tokenA|tokenB)([0-9]{4})', s)) 

Result: ['1234', '3456']

like image 168
Wiktor Stribiżew Avatar answered Nov 29 '25 02:11

Wiktor Stribiżew


Simply do this:

re.findall(r"token[AB](\d{4})", s)

Put [AB] inside a character class, so that it would match either A or B

like image 28
Avinash Raj Avatar answered Nov 29 '25 03:11

Avinash Raj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!