Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capturing repeating subpatterns in Python regex

Tags:

python

regex

While matching an email address, after I match something like yasar@webmail, I want to capture one or more of (\.\w+)(what I am doing is a little bit more complicated, this is just an example), I tried adding (.\w+)+ , but it only captures last match. For example, [email protected] matches but only include .tr after yasar@webmail part, so I lost .something and .edu groups. Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?

like image 992
yasar Avatar asked Mar 19 '12 04:03

yasar


People also ask

How do you repeat in regex?

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.

How do you do multiple regex in Python?

made this to find all with multiple #regular #expressions. regex1 = r"your regex here" regex2 = r"your regex here" regex3 = r"your regex here" regexList = [regex1, regex1, regex3] for x in regexList: if re. findall(x, your string): some_list = re. findall(x, your string) for y in some_list: found_regex_list.

What does capture mean in regex?

capturing in regexps means indicating that you're interested not only in matching (which is finding strings of characters that match your regular expression), but you're also interested in using specific parts of the matched string later on.

What is Backreference in regular expression Python?

Introduction to the Python regex backreferences The backreferences allow you to reference capturing groups within a regular expression. In this syntax, N can be 1, 2, 3, etc. that represents the corresponding capturing group. Note that the \g<0> refer to the entire match, which has the same value as the match.


1 Answers

re module doesn't support repeated captures (regex supports it):

>>> m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', '[email protected]') >>> m.groups() ('yasar', 'webmail.something.edu.tr', 'webmail', '.tr') >>> m.captures(4) ['.something', '.edu', '.tr'] 

In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

like image 81
jfs Avatar answered Oct 13 '22 18:10

jfs