Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistency between $ and ^ in regex when using start/end arguments to re.search?

Tags:

python

regex

From what I've read, ^ should match the start of a string, and $ the end. However, with re.search(), it looks like the behavior of ^ continues to work fine, while $ 'breaks'. Example:

>>> a = re.compile( "^a" ) >>> print a.search( "cat", 1, 3 ) None 

This seems correct to me -- 'a' is not at the start of the string, even if it is at the start of the search.

>>> a = re.compile( "a$" ) >>> print a.search( "cat", 0, 2 ) <_sre.SRE_Match object at 0x7f41df2334a8> 

This seems wrong to me, or inconsistent at least.

The documentation on the re module explicitly mentions that the behavior of ^ does not change due to start/end arguments to re.search, but no change in behavior is mentioned for $ (that I've seen).

Can anyone explain why things were designed this way, and/or suggest a convenient workaround?

By workaround, I would like to compose a regex which always matches the end of the string, even when someone uses the end argument to re.search.

And why was re.search designed such that:

s.search( string, endPos=len(string) - 1 ) 

is the same as

s.search( string[:-1] ) 

when

s.search( string, startPos=1 ) 

is explicitly and intentionally not the same as

s.search( string[1:] ) 

It seems to be less an issue of inconsistency between ^ and $, and more of an inconsistency within the re.search function.

like image 750
bgutt3r Avatar asked Mar 30 '17 04:03

bgutt3r


People also ask

Is there any difference between re match () and re search () in the Python re module?

There is a difference between the use of both functions. Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found.

What is the difference between * and in regex?

represents any single character (usually excluding the newline character), while * is a quantifier meaning zero or more of the preceding regex atom (character or group). ? is a quantifier meaning zero or one instances of the preceding atom, or (in regex variants that support it) a modifier that sets the quantifier ...

What is the difference between match and search in regex?

Difference between matches() and find() in Java Regex matcher() method. The matches() method returns true If the regular expression matches the whole text. If not, the matches() method returns false. Whereas find() search for the occurrence of the regular expression passes to Pattern.

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.


1 Answers

According to the search() documentation here:

The optional parameter endpos limits how far the string will be searched; it will be as if the string is endpos characters long, so only the characters from pos to endpos - 1 will be searched for a match.

So your syntax, a.search("cat", 0, 2) is equivalent to a.search("ca"), which does match the pattern a$.

like image 160
gpanders Avatar answered Sep 22 '22 20:09

gpanders