Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex to match multiple times

Tags:

I'm trying to match a pattern against strings that could have multiple instances of the pattern. I need every instance separately. re.findall() should do it but I don't know what I'm doing wrong.

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')

I need 'http://url.com/123', http://url.com/456 and the two numbers 123 & 456 to be different elements of the match list.

I have also tried '/review: ((http://url.com/(\d+)\s?)+)/' as the pattern, but no luck.

like image 800
mavili Avatar asked Jul 01 '13 15:07

mavili


People also ask

Does re search return multiple matches?

The re.search() returns only the first match to the pattern from the target string. Use a re.search() to search pattern anywhere in the string.

How to group regex in python?

A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.

What is re Dotall?

By using re. DOTALL flag, you can modify the behavior of dot (.) character to match the newline character apart from other characters. Before using the DOTALL flag, let's look into how regular engine responds to the newline character. Python3.


2 Answers

Use this. You need to place 'review' outside the capturing group to achieve the desired result.

pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)

This gives output

>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
>>> match
[('http://url.com/123', '123'), ('http://url.com/456', '456')]
like image 57
Narendra Yadala Avatar answered Sep 18 '22 14:09

Narendra Yadala


You've got extra /'s in the regex. In python the pattern should just be a string. e.g. instead of this:

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)

It should be:

pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

Also typically in python you'd actually use a "raw" string like this:

pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

The extra r on the front of the string saves you from having to do lots of backslash escaping etc.

like image 25
John Montgomery Avatar answered Sep 18 '22 14:09

John Montgomery