Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Locating the position of a regex match in a string?

Tags:

python

regex

I'm currently using regular expressions to search through RSS feeds to find if certain words and phrases are mentioned, and would then like to extract the text on either side of the match as well. For example:

String = "This is an example sentence, it is for demonstration only" re.search("is", String) 

I'd like to know the position(s) of where the 'is' matches are found so that I can extract and output something like this:

1 match found: "This is an example sentence" 

I know that it would be easy to do with splits, but I'd need to know what the index of first character of the match was in the string, which I don't know how to find

like image 647
nb. Avatar asked Apr 20 '10 10:04

nb.


People also ask

How do I find a match in a string Python?

match() function of re in Python will search the regular expression pattern and return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object.

How do you check if a regex matches a string?

Use the test() method to check if a regular expression matches an entire string, e.g. /^hello$/. test(str) . The caret ^ and dollar sign $ match the beginning and end of the string. The test method returns true if the regex matches the entire string, and false otherwise.

What is the use of SPAN () in regular expression?

span() method returns a tuple containing starting and ending index of the matched string. If group did not contribute to the match it returns(-1,-1). Parameters: group (optional) By default this is 0. Return: A tuple containing starting and ending index of the matched string.


2 Answers

You could use .find("is"), it would return position of "is" in the string

or use .start() from re

>>> re.search("is", String).start() 2 

Actually its match "is" from "This"

If you need to match per word, you should use \b before and after "is", \b is the word boundary.

>>> re.search(r"\bis\b", String).start() 5 >>> 

for more info about python regular expressions, docs here

like image 50
YOU Avatar answered Sep 19 '22 00:09

YOU


I don't think this question has been completely answered yet because all of the answers only give single match examples. The OP's question demonstrates the nuances of having 2 matches as well as a substring match which should not be reported because it is not a word/token.

To match multiple occurrences, one might do something like this:

iter = re.finditer(r"\bis\b", String) indices = [m.start(0) for m in iter] 

This would return a list of the two indices for the original string.

like image 35
demongolem Avatar answered Sep 23 '22 00:09

demongolem