Consider the following: <pre class="prettyprint"><code>>>> import re >>> a = "first:second" >>> re.findall("[^:]*", a) ['first', '', 'second', ''] >>> re.sub("[^:]*", r"(\g<0>)", a) '(first):(second)' </code></pre> <code>re.sub()</code>'s behavior makes more sense initially, but I can also understand <code>re.findall()</code>'s behavior. After all, you can match an empty string between <code>first</code> and <code>:</code> that consists only of non-colon characters (exactly zero of them), but why isn't <code>re.sub()</code> behaving the same way? Shouldn't the result of the last command be <code>(first)():(second)()</code>?

You use the * which allows empty matches: <pre class="prettyprint"><code>'first' -> matched ':' -> not in the character class but, as the pattern can be empty due to the *, an empty string is matched -->'' 'second' -> matched '$' -> can contain an empty string before, an empty string is matched -->'' </code></pre> Quoting the documentation for <code>re.findall()</code>: <blockquote> Empty matches are included in the result unless they touch the beginning of another match. </blockquote> The reason you don't see empty matches in sub results is explained in the documentation for <code>re.sub()</code>: <blockquote> Empty matches for the pattern are replaced only when not adjacent to a previous match. </blockquote> Try this: <pre class="prettyprint"><code>re.sub('(?:Choucroute garnie)*', '#', 'ornithorynque') </code></pre> And now this: <pre class="prettyprint"><code>print re.sub('(?:nithorynque)*', '#', 'ornithorynque') </code></pre> There is no consecutive #

Why does re.findall() find more matches than re.sub()?

Tags:

python

regex

Consider the following:

>>> import re
>>> a = "first:second"
>>> re.findall("[^:]*", a)
['first', '', 'second', '']
>>> re.sub("[^:]*", r"(\g<0>)", a)
'(first):(second)'

re.sub()'s behavior makes more sense initially, but I can also understand re.findall()'s behavior. After all, you can match an empty string between first and : that consists only of non-colon characters (exactly zero of them), but why isn't re.sub() behaving the same way?

Shouldn't the result of the last command be (first)():(second)()?

200

asked May 04 '13 06:05

Tim Pietzcker

1 Answers

You use the * which allows empty matches:

'first'   -> matched
':'       -> not in the character class but, as the pattern can be empty due 
             to the *, an empty string is matched -->''
'second'  -> matched
'$'       -> can contain an empty string before,
             an empty string is matched -->''

Quoting the documentation for re.findall():

Empty matches are included in the result unless they touch the beginning of another match.

The reason you don't see empty matches in sub results is explained in the documentation for re.sub():

Empty matches for the pattern are replaced only when not adjacent to a previous match.

Try this:

re.sub('(?:Choucroute garnie)*', '#', 'ornithorynque')

And now this:

print re.sub('(?:nithorynque)*', '#', 'ornithorynque')

There is no consecutive #

answered Oct 01 '22 05:10

Casimir et Hippolyte

Related questions
                            
                                Allow a task execution if it's not already scheduled using celery
                            
                                how to make python awaitable object
                            
                                assign dtype with from_dict
                            
                                Sitemap and object with multiple urls
                            
                                Flask SQLAlchemy Data Mapper vs Active Record Pattern
                            
                                Adding a path to sys.path in python and pylint
                            
                                When using tweepy cursor, what is the best practice for catching over capacity errors?
                            
                                Splitting names that include "de", "da", etc. into first, middle, last, etc
                            
                                How to pickle Keras custom layer?
                            
                                Python Flask as Windows Service
                            
                                CUDA initialization: Unexpected error from cudaGetDeviceCount()
                            
                                Python equivalent of std::set and std::multimap
                            
                                Django InlineModelAdmin - set inline field from request on save (set user field automatically) (save_formset vs save_model)
                            
                                Writing a parallel programming framework, what have I missed?
                            
                                Attempted relative import in non-package (after 2to3)
                            
                                Design tips for a program to be run in 25 years [closed]
                            
                                South: run a migration for a column that is both unique and not null
                            
                                how to start django shell with ipython in qtconsole mode?
                            
                                pip: inconsistent permissions issues
                            
                                How to use modern string formatting options with Python's logging module?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With