Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match--but not capture--in Python regular expressions?

Tags:

python

regex

I've got a function spitting out "Washington D.C., DC, USA" as output. I need to capture "Washington, DC" for reasons that have to do with how I handle every single other city in the country. (Note: this is not the same as "D.C.", I need the comma to be between "Washington" and "DC", whitespace is fine)

I can't for the life of me figure out how to capture this.

Here's what I've tried:

    >>>location = "Washington D.C., DC, USA"

    >>>match = re.search(r'\w+\s(?:D\.C\.), \w\w(?=\W)', location).group()
    >>>match
    u'Washington D.C., DC'

Is not (?: ...) supposed to just match (and not capture) "D.C."?

Here are the 2.7.2 Docs:

(?:...) A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

What gives??

Thanks in advance!

like image 664
Matt Parrilla Avatar asked Aug 01 '11 22:08

Matt Parrilla


People also ask

What is non capturing group in regex?

Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we'll explore how to use non-capturing groups in Java Regular Expressions.

What does regex return if no match?

Note: If there is no match, the value None will be returned, instead of the Match Object. The Match object has properties and methods used to retrieve information about the search, and the result: . span() returns a tuple containing the start-, and end positions of the match.

How do you exclude a string in Python?

The EXCLUDE( ) function compares each character in string with the characters listed in characters_to_exclude. If a match occurs, the character is excluded from the output string. For example, the output for EXCLUDE("123-45-4536", "-") is "123454536".


1 Answers

That's a clever way indeed, but not-capturing doesn't mean removing it from match. It just mean, that it's not considered as an output group.

You should try to do something similar to the following:

match = re.search(r'(\w+)\s(?:D\.C\.), (\w\w)\W', location).groups()

This prints ('Washington', 'DC').

Note the difference between .group() and .groups(). The former gives you the whole string that was matched, the latter only the captured groups. Remember, you need to specify what you want to include in the output, not what you want to exclude.

like image 107
tomasz Avatar answered Sep 16 '22 18:09

tomasz