Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expressions assigning to named groups

When you use variables (is that the correct word?) in python regular expressions like this: "blah (?P\w+)" ("value" would be the variable), how could you make the variable's value be the text after "blah " to the end of the line or to a certain character not paying any attention to the actual content of the variable. For example, this is pseudo-code for what I want:

>>> import re
>>> p = re.compile("say (?P<value>continue_until_text_after_assignment_is_recognized) endsay")
>>> m = p.match("say Hello hi yo endsay")
>>> m.group('value')
'Hello hi yo'

Note: The title is probably not understandable. That is because I didn't know how to say it. Sorry if I caused any confusion.

like image 423
None Avatar asked Apr 26 '10 00:04

None


People also ask

How do I specify a group in regex?

A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.

What are named groups in regex?

Named GroupsThis pattern will match each piece of data and will create three Name Groups: Group 'Name' with data John , Group 'Surname' with data Doe and Group 'Email' with data [email protected] . Each language and regex engine define how to access matched groups.

How do I reference a capture group in regex Python?

Normally, within a pattern, you create a back-reference to the content a capture group previously matched by using a backslash followed by the group number—for instance \1 for Group 1. (The syntax for replacements can vary.)

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .


2 Answers

You need to specify what you want to match if the text is, for example,

say hello there and endsay but some more endsay

If you want to match the whole hello there and endsay but some more substring, @David's answer is correct. Otherwise, to match just hello there and, the pattern needs to be:

say (?P<value>.+?) endsay

with a question mark after the plus sign to make it non-greedy (by default it's greedy, gobbling up all it possibly can while allowing an overall match; non-greedy means it gobbles as little as possible, again while allowing an overall match).

like image 150
Alex Martelli Avatar answered Oct 20 '22 07:10

Alex Martelli


For that you'd want a regular expression of

"say (?P<value>.+) endsay"

The period matches any character, and the plus sign indicates that that should be repeated one or more times... so .+ means any sequence of one or more characters. When you put endsay at the end, the regular expression engine will make sure that whatever it matches does in fact end with that string.

like image 20
David Z Avatar answered Oct 20 '22 07:10

David Z