I have a long .txt file. I want to find all the matching results with regex. for example : <pre class="prettyprint"><code>test_str = 'ali. veli. ahmet.' src = re.finditer(r'(\w+\.\s){1,2}', test_str, re.MULTILINE) print(*src) </code></pre> this code returns : <pre class="prettyprint"><code><re.Match object; span=(0, 11), match='ali. veli. '> </code></pre> i need; <pre class="prettyprint"><code>['ali. veli', 'veli. ahmet.'] </code></pre> how can i do that with regex?

The <code>(\w+\.\s){1,2}</code> pattern contains a repeated capturing group, and Python <code>re</code> does not store all the captures it finds, it only saves the last one into the group memory buffer. At any rate, you do not need the repeated capturing group because you need to extract multiple occurrences of the pattern from a string, and <code>re.finditer</code> or <code>re.findall</code> will do that for you. Also, the <code>re.MULTILINE</code> flag is not necessar here since there are no <code>^</code> or <code>$</code> anchors in the pattern. You may get the expected results using <pre class="prettyprint"><code>import re test_str = 'ali. veli. ahmet.' src = re.findall(r'(?=\b(\w+\.\s+\w+))', test_str) print(src) # => ['ali. veli', 'veli. ahmet'] </code></pre> See the Python demo The pattern means <ul> <li> <code>(?=</code> - start of a positive lookahead <ul> <li> <code>\b</code> - a word boundary (crucial here, it is necessary to only start capturing at word boundaries)</li> <li> <code>(\w+\.\s+\w+)</code> - Capturing group 1: 1+ word chars, <code>.</code>, 1+ whitespaces and 1+ word chars</li> </ul> </li> <li> <code>)</code> - end of the lookahead.</li> </ul>

How to find all matches with a regex where part of the match overlaps

Tags:

python

regex

iteration

I have a long .txt file. I want to find all the matching results with regex.

for example :

test_str = 'ali. veli. ahmet.'
src = re.finditer(r'(\w+\.\s){1,2}', test_str, re.MULTILINE)
print(*src)

this code returns :

<re.Match object; span=(0, 11), match='ali. veli. '>

i need;

['ali. veli', 'veli. ahmet.']

how can i do that with regex?

711

asked May 16 '20 22:05

Esat Mahmut Bayol

1 Answers

The (\w+\.\s){1,2} pattern contains a repeated capturing group, and Python re does not store all the captures it finds, it only saves the last one into the group memory buffer. At any rate, you do not need the repeated capturing group because you need to extract multiple occurrences of the pattern from a string, and re.finditer or re.findall will do that for you.

Also, the re.MULTILINE flag is not necessar here since there are no ^ or $ anchors in the pattern.

You may get the expected results using

import re
test_str = 'ali. veli. ahmet.'
src = re.findall(r'(?=\b(\w+\.\s+\w+))', test_str)
print(src)
# => ['ali. veli', 'veli. ahmet']

See the Python demo

The pattern means

(?= - start of a positive lookahead
- \b - a word boundary (crucial here, it is necessary to only start capturing at word boundaries)
- (\w+\.\s+\w+) - Capturing group 1: 1+ word chars, ., 1+ whitespaces and 1+ word chars
) - end of the lookahead.

200

answered Oct 17 '22 11:10

Wiktor Stribiżew

Related questions
                            
                                Jupyter Notebook exported HTML dark color
                            
                                Different slices give different inequalities for same elements
                            
                                Generate product of list with conditions
                            
                                ValueError: Invalid element(s) received for the 'data' property
                            
                                Perform GridSearchCV with MLFlow
                            
                                How to split data based on a column value in sklearn
                            
                                Modifying a string while looping on it
                            
                                How can I load a partial pretrained pytorch model?
                            
                                Dynamically Modify Python Method and Arguments
                            
                                numpy.fft.fft() implementation in Python
                            
                                flask: how to bridge front-end with back-end service to render api authentication?
                            
                                Problem creating nested JSON with python from csv with columns without value
                            
                                Error while installing Python-saml package in windows
                            
                                How to type hint with an optional import?
                            
                                How to find set of lowest sum of distinct column elements in python?
                            
                                How can I get Sphinx autosummary to generate full API documentation for classes, as well as a *summary table* for those classes?
                            
                                What is the best library in python to deal with excel files? [closed]
                            
                                Overriding Flask-User/Flask-Login's default templates
                            
                                What should be the Input types for Earth Mover Loss when images are rated in decimals from 0 to 9 (Keras, Tensorflow)
                            
                                Randomly assign a pair to each item in a list without repetitions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With