Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusing with the usage of regex in Python

Tags:

python

regex

I'm confused with the following three patterns, would someone explain it in more detail?

## IPython with Python 2.7.3
In [62]: re.findall(r'[a-z]*',"f233op")
Out[62]: ['f', '', '', '', 'op', '']  ## why does the last '' come out?

In [63]: re.findall(r'([a-z])*',"f233op")
Out[63]: ['f', '', '', '', 'p', '']  ## why does the character 'o' get lost?

In [64]: re.findall(r'([a-z]*)',"f233op")
Out[64]: ['f', '', '', '', 'op', '']  ## what's the different than line 63 above?
like image 375
vicd Avatar asked Mar 06 '14 15:03

vicd


People also ask

What is Python RegEx used for?

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

How does Python support RegEx?

The power of regular expressions is that they can specify patterns, not just fixed characters. Here are the most basic patterns which match single chars: a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: .

Can regular expressions be used to handle pattern matching issues in Python?

Since then, regexes have appeared in many programming languages, editors, and other tools as a means of determining whether a string matches a specified pattern. Python, Java, and Perl all support regex functionality, as do most Unix tools and many text editors.

What is the difference between match and search I RegEx in Python?

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).


1 Answers

Example 1

re.findall(r'[a-z]*',"f233op")

This pattern is matching zero-or-more instances of lower case alphabet characters. The ZERO-or-more part is key here, since a match of nothing, starting from every index position in the string, is just as valid as a match of f or op. The last empty string returned is the match starting from the end of the string (the position between p and $ (end of string).

Example 2

re.findall(r'([a-z])*',"f233op")

Now you are matching character groups, consisting of a single lower-case alphabet character. The o is no longer returned because this is a greedy search, and the last valid matched group will be returned. So if you changed the string to f233op12fre, the final e would be returned, but no the preceding f or r. Likewise, if you take out the p in your string, you still see that o is returned as a valid match.

Conversely, if you tried to make this regex non-greedy by adding a ? (eg. ([a-z])*?), the returned set of matches would all be empty strings, since a valid match of nothing has a higher precedence of a valid match of something.

Example 3

re.findall(r'([a-z]*)',"f233op")

Nothing is different in the matched characters, but now you are returning character groups instead of raw matches. The output of this regex query will be the same as your first example, but you'll notice that if you add an additional matching group, you will suddenly see the results of each match attempt grouped into tuples:

IN : re.findall(r'([a-z]*)([0-9]*)',"f233op")
OUT: [('f', '233'), ('op', ''), ('', '')]  

Contrast this with the same pattern, minus the parenthesis (groups), and you'll see why they are important:

IN : re.findall(r'[a-z]*[0-9]*',"f233op")
OUT: ['f233', 'op', ''] 

Also...

It can be useful to plug regex patterns like these into regex diagram generators like Regexplained to see how the pattern matching logic works. For example, as an explanation as to why your regex is always returning empty character string matches, take a look at the difference between the patterns [a-z]* and [a-z]+.

Don't forget to check the Python docs for the re library if you get stuck, they actually give a pretty stellar explanation for the standard regex syntax.

like image 63
woemler Avatar answered Oct 12 '22 23:10

woemler