I'm currently studying regular expressions and have come across an inquiry.
So the title of the question is what I'm trying to find out. I thought since \s
represents a white space, re.split(" ", string)
and re.split("\s+", string)
would give out same values, as shown next:
>>> import re
>>> a = re.split(" ", "Why is this wrong")
>>> a
["Why", "is", "this", "wrong"]
>>> import re
>>> a = re.split("\s+", "Why is this wrong")
>>> a
["Why", "is", "this", "wrong"]
These two give out the same answers so I thought that they were the same thing. However, it turns out that these are different. In what case would it be different? And what am I missing here that is blinding me?
This only look similar based on your example.
A split on ' '
(a single space) does exactly that - it splits on a single space. Consecutive spaces will lead to empty "matches" when you split.
A split on '\s+'
will also split on multiple occurences of those characters and it includes other whitespaces then "pure spaces":
import re
a = re.split(" ", "Why is this \t \t wrong")
b = re.split("\s+", "Why is this \t \t wrong")
print(a)
print(b)
Output:
# re.split(" ",data)
['Why', '', '', '', 'is', 'this', '', '\t', '\t', '', 'wrong']
# re.split("\s+",data)
['Why', 'is', 'this', 'wrong']
Documentation:
\s
Matches any whitespace character; this is equivalent to the class[ \t\n\r\f\v]
. (https://docs.python.org/3/howto/regex.html#matching-characters)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With