I have a string variable containing <pre class="prettyprint"><code>string = "123hello456world789" </code></pre> string contain no spacess. I want to write a regex such that prints only words containing(a-z) I tried a simple regex <pre class="prettyprint"><code>pat = "([a-z]+){1,}" match = re.search(r""+pat,word,re.DEBUG) </code></pre> match object contains only the word <code>Hello</code> and the word <code>World</code> is not matched. When is used <code>re.findall()</code> I could get both <code>Hello</code> and <code>World</code>. My question is why we can't do this with <code>re.search()</code>? How do this with <code>re.search()</code>?

<code>re.search()</code> finds the pattern once in the string, documenation: <blockquote> Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string. </blockquote> In order to match every occurrence, you need <code>re.findall()</code>, documentation: <blockquote> Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match. </blockquote> Example: <pre class="prettyprint"><code>>>> import re >>> regex = re.compile(r'([a-z]+)', re.I) >>> # using search we only get the first item. >>> regex.search("123hello456world789").groups() ('hello',) >>> # using findall we get every item. >>> regex.findall("123hello456world789") ['hello', 'world'] </code></pre> <hr> UPDATE: Due to your duplicate question (as discussed at this link) I have added my other answer here as well: <pre class="prettyprint"><code>>>> import re >>> regex = re.compile(r'([a-z][a-z-\']+[a-z])') >>> regex.findall("HELLO W-O-R-L-D") # this has uppercase [] # there are no results here, because the string is uppercase >>> regex.findall("HELLO W-O-R-L-D".lower()) # lets lowercase ['hello', 'w-o-r-l-d'] # now we have results >>> regex.findall("123hello456world789") ['hello', 'world'] </code></pre> As you can see, the reason why you were failing on the first sample you provided is because of the uppercase, you can simply add the <code>re.IGNORECASE</code> flag, though you mentioned that matches should be lowercase only.

Python re.search

Tags:

python

regex

I have a string variable containing

string = "123hello456world789"

string contain no spacess. I want to write a regex such that prints only words containing(a-z) I tried a simple regex

pat = "([a-z]+){1,}"
match = re.search(r""+pat,word,re.DEBUG)

match object contains only the word Hello and the word World is not matched.

When is used re.findall() I could get both Hello and World.

My question is why we can't do this with re.search()?

How do this with re.search()?

336

asked Nov 27 '13 10:11

Krishna M

1 Answers

re.search() finds the pattern once in the string, documenation:

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

In order to match every occurrence, you need re.findall(), documentation:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

Example:

>>> import re
>>> regex = re.compile(r'([a-z]+)', re.I)
>>> # using search we only get the first item.
>>> regex.search("123hello456world789").groups()
('hello',)
>>> # using findall we get every item.
>>> regex.findall("123hello456world789")
['hello', 'world']

UPDATE:

Due to your duplicate question (as discussed at this link) I have added my other answer here as well:

>>> import re
>>> regex = re.compile(r'([a-z][a-z-\']+[a-z])')
>>> regex.findall("HELLO W-O-R-L-D") # this has uppercase
[]  # there are no results here, because the string is uppercase
>>> regex.findall("HELLO W-O-R-L-D".lower()) # lets lowercase
['hello', 'w-o-r-l-d'] # now we have results
>>> regex.findall("123hello456world789")
['hello', 'world']

As you can see, the reason why you were failing on the first sample you provided is because of the uppercase, you can simply add the re.IGNORECASE flag, though you mentioned that matches should be lowercase only.

137

answered Oct 14 '22 09:10

Inbar Rose

Related questions
                            
                                How to get the correlation between two timeseries using Pandas
                            
                                SQLAlchemy One-to-Many relationship on single table inheritance - declarative
                            
                                Cython and deepcopy() woes with referenced methods/functions. Any alternative ideas?
                            
                                IPython won't start
                            
                                How to enable {% trans %} tag for jinja templates?
                            
                                Ethernet CRC32 calculation - software vs algorithmic result
                            
                                Boost.Python custom exception class
                            
                                Python timeit and program output
                            
                                create a lambda function from a string **properly**
                            
                                How to convert python .py file into an executable file for use cross platform?
                            
                                How to prevent pycallgraph from entering standard library functions?
                            
                                Matplotlib pie-chart: How to replace auto-labelled relative values by absolute values
                            
                                Dict of dicts of dicts to DataFrame [duplicate]
                            
                                Reading all files in all directories [duplicate]
                            
                                Import errors when running nosetests that I can't reproduce outside of nose
                            
                                End-to-end example with PyXB. From an XSD schema to an XML document
                            
                                dbscan - setting limit on maximum cluster span
                            
                                Unknown python expression filename=r'/path/to/file'
                            
                                How to run python script on terminal (ubuntu)?
                            
                                Slow division in cython

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With