I have a question, can I say \t is equivalent to \s+ in regular expression.?
I have some lines of code :
>>> b = '\tNadya Carson'
>>> c = re.compile(r'\s\s*')
>>> c
<_sre.SRE_Pattern object at 0x02729800>
>>> c.sub('',b)
'NadyaCarson'
>>> c = re.compile(r'\s\s+')
>>> c
<_sre.SRE_Pattern object at 0x027292F0>
There is pattern object till here but when I want to substitute with no space, it still shows \t instead of substituting it:
>>> c.sub('',b)
'\tNadya Carson'
Why is the attribute sub not working in this case.? Thank you.!
\t is not equivalent to \s+, but \s+ should match a tab (\t).
The problem in your example is that the second pattern \s\s+ is looking for two or more whitespace characters, and \t is only one whitespace character.
Here are some examples that should help you understand:
>>> result = re.match(r'\s\s+', '\t')
>>> print result
None
>>> result = re.match(r'\s\s+', '\t\t')
>>> print result
<_sre.SRE_Match object at 0x10ff228b8>
\s\s+ would also match ' \t', '\n\t', ' \n \t \t\n'.
Also, \s\s* is equivalent to \s+. Both will match one or more whitespace characters.
\s+ is not equivalent to \t because \s does not mean <space>, but instead means <whitespace>. A literal space (sometimes four of which are used for tabs, depending on the application used to display them) is simply . That is, hitting the spacebar creates a literal space. That's hardly surprising.
\s\s will never match a \t because since \t IS whitespace, \s matches it. It will match \t\t, but that's because there's two characters of whitespace (both tab characters). When your regex runs \s\s+, it's looking for one character of whitespace followed by one, two, three, or really ANY number more. When it reads your regex it does this:
\s\s+

Debuggex Demo
The \t matches the first \s, but when it hits the second one your regex spits it back out saying "Oh, nope nevermind."
Your first regex does this:
\s\s*

Debuggex Demo
Again, the \t matches your first \s, and when the regex continues it sees that it doesn't match the second \s so it takes the "high road" instead and jumps over it. That's why \s\s* matches, because the * quantifier includes "or zero." while the + quantifier does not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With