I have a long .txt file. I want to find all the matching results with regex.
for example :
test_str = 'ali. veli. ahmet.'
src = re.finditer(r'(\w+\.\s){1,2}', test_str, re.MULTILINE)
print(*src)
this code returns :
<re.Match object; span=(0, 11), match='ali. veli. '>
i need;
['ali. veli', 'veli. ahmet.']
how can i do that with regex?
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”. Whether or not you will have line breaks in your expression depends on what you are trying to match. Line breaks can be useful “anchors” that define where some pattern occurs in relation to the beginning or end of a line.
The match() method retrieves the result of matching a string against a regular expression.
For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.
The (\w+\.\s){1,2}
pattern contains a repeated capturing group, and Python re
does not store all the captures it finds, it only saves the last one into the group memory buffer. At any rate, you do not need the repeated capturing group because you need to extract multiple occurrences of the pattern from a string, and re.finditer
or re.findall
will do that for you.
Also, the re.MULTILINE
flag is not necessar here since there are no ^
or $
anchors in the pattern.
You may get the expected results using
import re
test_str = 'ali. veli. ahmet.'
src = re.findall(r'(?=\b(\w+\.\s+\w+))', test_str)
print(src)
# => ['ali. veli', 'veli. ahmet']
See the Python demo
The pattern means
(?=
- start of a positive lookahead
\b
- a word boundary (crucial here, it is necessary to only start capturing at word boundaries)(\w+\.\s+\w+)
- Capturing group 1: 1+ word chars, .
, 1+ whitespaces and 1+ word chars)
- end of the lookahead.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With