Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regular Expression

Tags:

python

regex

I'd like to extract the designator and ops from the string designator: op1 op2, in which there could be 0 or more ops and multiple spaces are allowed. I used the following regular expression in Python

import re
match = re.match(r"^(\w+):(\s+(\w+))*", "des1: op1   op2")

The problems is that only des1 and op2 is found in the matching groups, op1 is not. Does anyone know why?

The groups from above code is
Group 0: des1: op1 op2
Group 1: des1
Group 2:  op2
Group 3: op2
like image 440
Jeff Avatar asked Nov 07 '10 20:11

Jeff


People also ask

What is a regular expression Python?

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

What is difference [] and () in RegEx?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.


1 Answers

both are 'found', but only one can be 'captured' by the group. if you need to capture more than one group, then you need to use the regular expression functionality multiple times. You could do something like this, first by rewriting the main expression:

match = re.match(r"^(\w+):(.*)", "des1: op1   op2")

then you need to extract the individual subsections:

ops = re.split(r"\s+", match.groups()[1])[1:]
like image 71
SingleNegationElimination Avatar answered Oct 03 '22 20:10

SingleNegationElimination