Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Named regular expression group "(?P<group_name>regexp)": what does "P" stand for?

In Python, the (?P<group_name>…) syntax allows one to refer to the matched string through its name:

>>> import re
>>> match = re.search('(?P<name>.*) (?P<phone>.*)', 'John 123456')
>>> match.group('name')
'John'

What does "P" stand for? I could not find any hint in the official documentation.

I would love to get ideas about how to help my students remember this syntax. Knowing what "P" does stand for (or might stand for) would be useful.

like image 686
Eric O Lebigot Avatar asked Apr 08 '12 01:04

Eric O Lebigot


People also ask

What does P mean in regular expression?

The P is Python identifier for a named capture group. You will see P in regex used in jdango and other python based regex implementations.

What does regexp stand for?

Regular Expressions (Regex) Regular Expression, or regex or regexp in short, is extremely and amazingly powerful in searching and manipulating text strings, particularly in processing text files. One line of regex can easily replace several dozen lines of programming codes.

What are named groups in Python?

Groups are used in Python in order to reference regular expression matches. By default, groups, without names, are referenced according to numerical order starting with 1 . Let's say we have a regular expression that has 3 subexpressions.

How do you name a group in regex?

The name can contain letters and numbers but must start with a letter. (? P<x>abc){3} matches abcabcabc. The group x matches abc.


2 Answers

Since we're all guessing, I might as well give mine: I've always thought it stood for Python. That may sound pretty stupid -- what, P for Python?! -- but in my defense, I vaguely remembered this thread [emphasis mine]:

Subject: Claiming (?P...) regex syntax extensions

From: Guido van Rossum ([email protected])

Date: Dec 10, 1997 3:36:19 pm

I have an unusual request for the Perl developers (those that develop the Perl language). I hope this (perl5-porters) is the right list. I am cc'ing the Python string-sig because it is the origin of most of the work I'm discussing here.

You are probably aware of Python. I am Python's creator; I am planning to release a next "major" version, Python 1.5, by the end of this year. I hope that Python and Perl can co-exist in years to come; cross-pollination can be good for both languages. (I believe Larry had a good look at Python when he added objects to Perl 5; O'Reilly publishes books about both languages.)

As you may know, Python 1.5 adds a new regular expression module that more closely matches Perl's syntax. We've tried to be as close to the Perl syntax as possible within Python's syntax. However, the regex syntax has some Python-specific extensions, which all begin with (?P . Currently there are two of them:

(?P<foo>...) Similar to regular grouping parentheses, but the text
matched by the group is accessible after the match has been performed, via the symbolic group name "foo".

(?P=foo) Matches the same string as that matched by the group named "foo". Equivalent to \1, \2, etc. except that the group is referred
to by name, not number.

I hope that this Python-specific extension won't conflict with any future Perl extensions to the Perl regex syntax. If you have plans to use (?P, please let us know as soon as possible so we can resolve the conflict. Otherwise, it would be nice if the (?P syntax could be permanently reserved for Python-specific syntax extensions. (Is there some kind of registry of extensions?)

to which Larry Wall replied:

[...] There's no registry as of now--yours is the first request from outside perl5-porters, so it's a pretty low-bandwidth activity. (Sorry it was even lower last week--I was off in New York at Internet World.)

Anyway, as far as I'm concerned, you may certainly have 'P' with my blessing. (Obviously Perl doesn't need the 'P' at this point. :-) [...]

So I don't know what the original choice of P was motivated by -- pattern? placeholder? penguins? -- but you can understand why I've always associated it with Python. Which considering that (1) I don't like regular expressions and avoid them wherever possible, and (2) this thread happened fifteen years ago, is kind of odd.

like image 184
DSM Avatar answered Oct 18 '22 00:10

DSM


Python Extension. From the Python Docs:

The solution chosen by the Perl developers was to use (?...) as the extension syntax. ? immediately after a parenthesis was a syntax error because the ? would have nothing to repeat, so this didn’t introduce any compatibility problems. The characters immediately after the ? indicate what extension is being used, so (?=foo) is one thing (a positive lookahead assertion) and (?:foo) is something else (a non-capturing group containing the subexpression foo).

Python supports several of Perl’s extensions and adds an extension syntax to Perl’s extension syntax.If the first character after the question mark is a P, you know that it’s an extension that’s specific to Python

https://docs.python.org/3/howto/regex.html

like image 43
SomeGuy Avatar answered Oct 18 '22 01:10

SomeGuy