Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex explanation needed - $ character usage

Tags:

python

regex

My apologies for a completely newbie question. I did try searching stackoverflow first before posting this question.

I am trying to learn regex using python from diveintopython3.net. While fiddling with the examples, I failed to understand one particular output for a regex search (shown below):

>>> pattern = 'M?M?M?$'

>>> re.search(pattern,'MMMMmmmmm')
<_sre.SRE_Match object at 0x7f0aa8095168>

Why does the above regex pattern match the input text? My understanding is that the $ character should match only at the end of the string. But the input text ends with 'mmmm'. So i though the patterns should not match.

My python version is :

Python 3.3.2 (default, Dec  4 2014, 12:49:00)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux

EDIT: Attached a screenshot from Debuggex.enter image description here

like image 979
sudeepta bhuyan Avatar asked Apr 13 '15 06:04

sudeepta bhuyan


Video Answer


2 Answers

Why does the above regex pattern match the input text?

Because you made the previous M's as optional. M? refers an optional M. M may or maynot present. So the above regex 'M?M?M?$' matches only the zero width end of the line boundary. Hence you got a match.

like image 192
Avinash Raj Avatar answered Nov 02 '22 23:11

Avinash Raj


It is because all the M symbols are optional, and $ (the only required symbol in this regex) matches at the end. You have a regex that is equal to zero-length assertion, that captures no characters but still there are matches.

Here is a visualization:

M?M?M?$

Regular expression visualization

Debuggex Demo

like image 33
Wiktor Stribiżew Avatar answered Nov 03 '22 00:11

Wiktor Stribiżew