Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex Subgroup Capturing

Tags:

python

regex

I'm trying to parse the following string:

constructor: function(some, parameters, here) {

With the following regex:

re.search("(\w*):\s*function\((?:(\w*)(?:,\s)*)*\)", line).groups()

And I'm getting:

('constructor', '')

But I was expecting something more like:

('constructor', 'some', 'parameters', 'here')

What am I missing?

like image 740
Evan Siroky Avatar asked Mar 12 '15 21:03

Evan Siroky


2 Answers

If you change your pattern to:

print re.search(r"(\w*):\s*function\((?:(\w+)(?:,\s)?)*\)", line).groups()

You'll get:

('constructor', 'here')

This is because (from docs):

If a group is contained in a part of the pattern that matched multiple times, the last match is returned.

If you can do this in one step, I don't know how. Your alternative, of course is to do something like:

def parse_line(line):
    cons, args = re.search(r'(\w*):\s*function\((.*)\)', line).groups()
    mats = re.findall(r'(\w+)(?:,\s*)?', args)
    return [cons] + mats

print parse_line(line)  # ['constructor', 'some', 'parameters', 'here']
like image 68
jedwards Avatar answered Oct 13 '22 12:10

jedwards


One option is to use more advanced regex instead of the stock re. Among other nice things, it supports captures, which, unlike groups, save every matching substring:

>>> line = "constructor: function(some, parameters, here) {"
>>> import regex
>>> regex.search("(\w*):\s*function\((?:(\w+)(?:,\s)*)*\)", line).captures(2)
['some', 'parameters', 'here']
like image 34
georg Avatar answered Oct 13 '22 11:10

georg