The following code is very strange:
>>> words = "4324324 blahblah"
>>> print re.findall(r'(\s)\w+', words)
[' ']
>>> print re.search(r'(\s)\w+', words).group()
blahblah
The ()
operator seems to behave poorly with findall. Why is this? I need it for a csv file.
Edit for clarity: I want to display blahblah
using findall.
I discovered that re.findall(r'\s(\w+)', words)
does what I want, but have no idea why findall treats groups in this way.
Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.
re. match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.
The re. findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern . It returns a list of strings in the matching order when scanning the string from left to right.
Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.
One character off:
>>> print re.search(r'(\s)\w+', words).groups()
(' ',)
>>> print re.search(r'(\s)\w+', words).group(1)
' '
findall
returns a list of all groups captured. You're getting a space back because that's what you capture. Stop capturing, and it works fine:
>>> print re.findall(r'\s\w+', words)
[' blahblah']
Use the csv
module
If you prefer to keep the capturing groups in your regex, but you still want to find the entire contents of each match instead of the groups, you can use the following:
[m.group() for m in re.finditer(r'(\s)\w+', words)]
For example:
>>> [m.group() for m in re.finditer(r'(\s)\w+', '4324324 blahblah')]
[' blahblah']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With