I want to match different parts of a string and store them in separate variables for later use. For example, <pre class="prettyprint"><code>string = "bunch(oranges, bananas, apples)" rxp = "[a-z]*\([var1]\, [var2]\, [var3]\)" </code></pre> so that I have <pre class="prettyprint"><code>var1 = "oranges" var2 = "bananas" var3 = "apples" </code></pre> Something like what re.search() does but for multiple different parts of the same match. EDIT: the number of fruits in the list is not known beforehand. Should have put this in with the question.

For regular expressions, you can use the <code>match()</code> function to do what you want, and use groups to get your results. Also, don't assign to the word <code>string</code>, as that is a built-in function (even though it's deprecated). For your example, if you know there are always the same number of fruits each time, it looks like this: <pre class="prettyprint"><code>import re input = "bunch(oranges, bananas, apples)" var1, var2, var3 = re.match('bunch\((\w+), (\w+), (\w+)\)', input).group(1, 2, 3) </code></pre> Here, I used the <code>\w</code> special sequence, which matches any alphanumeric character or underscore, as explained in the documentation If you don't know the number of fruits in advance, you can use two regular expression calls, one to get extract the minimal part of the string where the fruits are listed, getting rid of "bunch" and the parentheses, then <code>finditer</code> to extract the names of the fruits: <pre class="prettyprint"><code>import re input = "bunch(oranges, bananas, apples)" [m.group(0) for m in re.finditer('\w+(, )?', re.match('bunch\(([^)]*)\)', input).group(1))] </code></pre>

If you want, you can use <code>groupdict</code> to store matching items in a dictionary: <pre class="prettyprint"><code>regex = re.compile("[a-z]*\((?P<var1>.*)\, (?P<var2>.*)\, (?P<var3>.*)") match = regex.match("bunch(oranges, bananas, apples)") if match: match.groupdict() #{'var1': 'oranges', 'var2': 'bananas', 'var3': 'apples)'} </code></pre>

Don't. Every time you use var1, var2 etc, you actually want a list. Unfortunately, this is no way to collect arbitrary number of subgroups in a list using <code>findall</code>, but you can use a hack like this: <pre class="prettyprint"><code>import re lst = [] re.sub(r'([a-z]+)(?=[^()]*\))', lambda m: lst.append(m.group(1)), string) print lst # ['oranges', 'bananas', 'apples'] </code></pre> Note that this works not only for this specific example, but also for any number of substrings.

Python re: Storing multiple matches in variables

Tags:

python

regex

I want to match different parts of a string and store them in separate variables for later use. For example,

Click to copy

string = "bunch(oranges, bananas, apples)"
rxp = "[a-z]*\([var1]\, [var2]\, [var3]\)"

so that I have

Click to copy

var1 = "oranges"
var2 = "bananas"
var3 = "apples"

Something like what re.search() does but for multiple different parts of the same match.

EDIT: the number of fruits in the list is not known beforehand. Should have put this in with the question.

685

asked Nov 18 '12 21:11

Arish

4 Answers

That is what re.search does. Just use capturing groups (parentheses) to access the stuff that was matched by certain subpatterns later on:

Click to copy

>>> import re
>>> m = re.search(r"[a-z]*\(([a-z]*), ([a-z]*), ([a-z]*)\)", string)
>>> m.group(0)
'bunch(oranges, bananas, apples)'
>>> m.group(1)
'oranges'
>>> m.group(2)
'bananas'
>>> m.group(3)
'apples'

Also note, that I used a raw string to avoid the double backslashes.

If your number of "variables" inside bunch can vary, you have a problem. Most regex engines cannot capture a variable number of strings. However in that case you could get away with this:

Click to copy

>>> m = re.search(r"[a-z]*\(([a-z, ]*)\)", string)
>>> m.group(1)
'oranges, bananas, apples'
>>> m.group(1).split(', ')
['oranges', 'bananas', 'apples']

answered Sep 29 '22 18:09

Martin Ender

For regular expressions, you can use the match() function to do what you want, and use groups to get your results. Also, don't assign to the word string, as that is a built-in function (even though it's deprecated). For your example, if you know there are always the same number of fruits each time, it looks like this:

Click to copy

import re
input = "bunch(oranges, bananas, apples)"
var1, var2, var3 = re.match('bunch\((\w+), (\w+), (\w+)\)', input).group(1, 2, 3)

Here, I used the \w special sequence, which matches any alphanumeric character or underscore, as explained in the documentation

If you don't know the number of fruits in advance, you can use two regular expression calls, one to get extract the minimal part of the string where the fruits are listed, getting rid of "bunch" and the parentheses, then finditer to extract the names of the fruits:

Click to copy

import re
input = "bunch(oranges, bananas, apples)"
[m.group(0) for m in re.finditer('\w+(, )?', re.match('bunch\(([^)]*)\)', input).group(1))]

answered Sep 29 '22 17:09

acjay

If you want, you can use groupdict to store matching items in a dictionary:

Click to copy

regex = re.compile("[a-z]*\((?P<var1>.*)\, (?P<var2>.*)\, (?P<var3>.*)")
match = regex.match("bunch(oranges, bananas, apples)")
if match:
    match.groupdict()

#{'var1': 'oranges', 'var2': 'bananas', 'var3': 'apples)'}

answered Sep 29 '22 18:09

tehmisvh

Don't. Every time you use var1, var2 etc, you actually want a list. Unfortunately, this is no way to collect arbitrary number of subgroups in a list using findall, but you can use a hack like this:

Click to copy

import re
lst = []
re.sub(r'([a-z]+)(?=[^()]*\))', lambda m: lst.append(m.group(1)), string)
print lst # ['oranges', 'bananas', 'apples']

Note that this works not only for this specific example, but also for any number of substrings.

answered Sep 29 '22 17:09

georg

Related questions
                            
                                Python list index out of range on return value of split
                            
                                NameError: name 'UTC' is not defined
                            
                                What is the maximum debuglevel for a Python httplib
                            
                                How to retrieve all the attributes of LDAP database
                            
                                Adding data members to Python classes from outside the function definition
                            
                                Numpy: use bins with infinite range
                            
                                Create and download an AWS ec2 keypair using python boto
                            
                                Python/SWIG: Output an array
                            
                                nose2 vs py.test with isolated processes
                            
                                How do bitwise operations work in Python?
                            
                                python: pass multiple arguments from one function to another
                            
                                How to get value / content in JSON object with python
                            
                                How to filter out columns that are all 0s in Python?
                            
                                scikit-learn GMM produce positive log probability
                            
                                Flask long routines
                            
                                Why can't Python access a subfunction from outside?
                            
                                Crontab fails to execute Python script
                            
                                Running django project without django installation
                            
                                django - when is .objects.get() evaluated?
                            
                                execlp() in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python re: Storing multiple matches in variables

Tags:

python

regex

Arish

People also ask

4 Answers

Martin Ender

acjay

tehmisvh

georg

Recent Activity

Donate For Us