Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between re.search() and re.findall()

Tags:

python

regex

The following code is very strange:

 >>> words = "4324324 blahblah"
 >>> print re.findall(r'(\s)\w+', words)
 [' ']
 >>> print re.search(r'(\s)\w+', words).group()
 blahblah

The () operator seems to behave poorly with findall. Why is this? I need it for a csv file.

Edit for clarity: I want to display blahblah using findall.

I discovered that re.findall(r'\s(\w+)', words) does what I want, but have no idea why findall treats groups in this way.

like image 561
CornSmith Avatar asked Dec 14 '12 23:12

CornSmith


People also ask

What is the difference between re search and re Findall?

Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.

What is the difference between the re match () and the Search () methods?

re. match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.

What is re Findall () in Python?

The re. findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern . It returns a list of strings in the matching order when scanning the string from left to right.

What is the difference between Findall and Finditer?

Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.


2 Answers

One character off:

>>> print re.search(r'(\s)\w+', words).groups()
(' ',)
>>> print re.search(r'(\s)\w+', words).group(1)
' '

findall returns a list of all groups captured. You're getting a space back because that's what you capture. Stop capturing, and it works fine:

>>> print re.findall(r'\s\w+', words)
[' blahblah']

Use the csv module

like image 57
Eric Avatar answered Sep 30 '22 01:09

Eric


If you prefer to keep the capturing groups in your regex, but you still want to find the entire contents of each match instead of the groups, you can use the following:

[m.group() for m in re.finditer(r'(\s)\w+', words)]

For example:

>>> [m.group() for m in re.finditer(r'(\s)\w+', '4324324 blahblah')]
[' blahblah']
like image 30
Andrew Clark Avatar answered Sep 30 '22 01:09

Andrew Clark