Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.search

Tags:

python

regex

I have a string variable containing

string = "123hello456world789"

string contain no spacess. I want to write a regex such that prints only words containing(a-z) I tried a simple regex

pat = "([a-z]+){1,}"
match = re.search(r""+pat,word,re.DEBUG)

match object contains only the word Hello and the word World is not matched.

When is used re.findall() I could get both Hello and World.

My question is why we can't do this with re.search()?

How do this with re.search()?

like image 336
Krishna M Avatar asked Nov 27 '13 10:11

Krishna M


People also ask

What does re search do in Python?

Python regex re.search() method looks for occurrences of the regex pattern inside the entire target string and returns the corresponding Match Object instance where the match found. The re.search() returns only the first match to the pattern from the target string.

What is r in re search Python?

the 'r' means the the following is a "raw string", ie. backslash characters are treated literally instead of signifying special treatment of the following character. http://docs.python.org/reference/lexical_analysis.html#literals. so '\n' is a single newline. and r'\n' is two characters - a backslash and the letter 'n'

What is the output of re search in Python?

However, re.search() only returns the first match. The lower case letter pattern matches: The sequence of letters at the beginning of the string. The zero-width spot between the 1 and 2.

What is re match in Python?

Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found. But if a match of substring is found somewhere in the middle of the string, it returns none.


1 Answers

re.search() finds the pattern once in the string, documenation:

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

In order to match every occurrence, you need re.findall(), documentation:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

Example:

>>> import re
>>> regex = re.compile(r'([a-z]+)', re.I)
>>> # using search we only get the first item.
>>> regex.search("123hello456world789").groups()
('hello',)
>>> # using findall we get every item.
>>> regex.findall("123hello456world789")
['hello', 'world']

UPDATE:

Due to your duplicate question (as discussed at this link) I have added my other answer here as well:

>>> import re
>>> regex = re.compile(r'([a-z][a-z-\']+[a-z])')
>>> regex.findall("HELLO W-O-R-L-D") # this has uppercase
[]  # there are no results here, because the string is uppercase
>>> regex.findall("HELLO W-O-R-L-D".lower()) # lets lowercase
['hello', 'w-o-r-l-d'] # now we have results
>>> regex.findall("123hello456world789")
['hello', 'world']

As you can see, the reason why you were failing on the first sample you provided is because of the uppercase, you can simply add the re.IGNORECASE flag, though you mentioned that matches should be lowercase only.

like image 137
Inbar Rose Avatar answered Oct 14 '22 09:10

Inbar Rose