Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In python, how to 'if finditer(...) has no matches'?

Tags:

python

regex

I would like to do something when finditer() does not find anything.

import re
pattern = "1"
string = "abc"  
matched_iter = re.finditer(pattern, string)
# <if matched_iter is empty (no matched found>.
#   do something.
# else
    for m in matched_iter:
        print m.group()

The best thing I could come up with is to keep track of found manually:

mi_no_find = re.finditer(r'\w+',"$$%%%%")   # not matching.
found = False
for m in mi_no_find:
    print m.group()
    found = True
if not found:
    print "Nothing found"

Related posts that don't answer:

  • Counting finditer matches: Number of regex matches (I don't need to count, I just need to know if there are no matches).
  • finditer vs match: different behavior when using re.finditer and re.match (says always have to loop over an iterator returned by finditer)

[edit]
- I have no interest in enumerating or counting total output. Only if found else not found actions.
- I understand I can put finditer into a list, but this would be inefficient for large strings. One objective is to have low memory utilization.

like image 466
Leo Ufimtsev Avatar asked May 08 '19 23:05

Leo Ufimtsev


2 Answers

Updated 04/10/2020

Use re.search(pattern, string) to check if a pattern exists.

pattern = "1"
string = "abc"

if re.search(pattern, string) is None:
    print('do this because nothing was found')

Returns:

do this because nothing was found

If you want to iterate over the return, then place the re.finditer() within the re.search().

pattern = '[A-Za-z]'
string = "abc"

if re.search(pattern, string) is not None:
    for thing in re.finditer(pattern, string):
        print('Found this thing: ' + thing[0])

Returns:

Found this thing: a
Found this thing: b
Found this thing: c

Therefore, if you wanted both options, use the else: clause with the if re.search() conditional.

pattern = "1"
string = "abc"

if re.search(pattern, string) is not None:
    for thing in re.finditer(pattern, string):
        print('Found this thing: ' + thing[0])
else:
    print('do this because nothing was found')

Returns:

do this because nothing was found

previous reply below (not sufficient, just read above)

If the .finditer() does not match a pattern, then it will not perform any commands within the related loop.

So:

  • Set the variable before the loop you are using to iterate over the regex returns
  • Call the variable after (And outside of) the loop you are using to iterate over the regex returns

This way, if nothing is returned from the regex call, the loop won't execute and your variable call after the loop will return the exact same variable it was set to.

Below, example 1 demonstrates the regex finding the pattern. Example 2 shows the regex not finding the pattern, so the variable within the loop is never set. Example 3 shows my suggestion - where the variable is set before the regex loop, so if the regex does not find a match (and subsequently, does not trigger the loop), the variable call after the loop returns the initial variable set (Confirming the regex pattern was not found).

Remember to import the import re module.

EXAMPLE 1 (Searching for the characters 'he' in the string 'hello world' will return 'he')

my_string = 'hello world'
pat = '(he)'
regex = re.finditer(pat,my_string)

for a in regex:
    b = str(a.groups()[0])
print(b)

# returns 'he'

EXAMPLE 2 (Searching for the characters 'ab' in the string 'hello world' do not match anything, so the 'for a in regex:' loop does not execute and does not assign the b variable any value.)

my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)

for a in regex:
    b = str(a.groups()[0])
print(b)

# no return

EXAMPLE 3 (Searching for the characters 'ab' again, but this time setting the variable b to 'CAKE' before the loop, and calling the variable b after, outside of the loop returns the initial variable - i.e. 'CAKE' - since the loop did not execute).

my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)

b = 'CAKE' # sets the variable prior to the for loop
for a in regex:
    b = str(a.groups()[0])
print(b) # calls the variable after (and outside) the loop

# returns 'CAKE'

It's also worth noting that when designing your pattern to feed into the regex, make sure to use the parenthesis to indicate the start and end of a group.

pattern = '(ab)' # use this
pattern = 'ab' # avoid using this

To tie back to the initial question:

Since nothing found won’t execute the for loop (for a in regex), the user can preload the variable, then check it after the for loop for the original loaded value. This will allow for the user to know if nothing was found.

my_string = 'hello world'
pat = '(ab)'
regex = re.finditer(pat,my_string)

b = 'CAKE' # sets the variable prior to the for loop
for a in regex:
    b = str(a.groups()[0])
if b == ‘CAKE’:
    # action taken if nothing is returned
like image 118
James Andrew Bush Avatar answered Oct 20 '22 10:10

James Andrew Bush


If performance isn't an issue, simply use findall or list(finditer(...)), which returns a list.

Otherwise, you can "peek" into the generator with next, then loop as normal if it raises StopIteration. Though there are other ways to do it, this is the simplest to me:

import itertools
import re

pattern = "1"
string = "abc"  
matched_iter = re.finditer(pattern, string)

try:
    first_match = next(matched_iter)
except StopIteration:
    print("No match!") # action for no match
else:
    for m in itertools.chain([first_match], matched_iter):
        print(m.group())
like image 29
iz_ Avatar answered Oct 20 '22 09:10

iz_