Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between re.split(" ", string) and re.split("\s+", string)?

I'm currently studying regular expressions and have come across an inquiry. So the title of the question is what I'm trying to find out. I thought since \s represents a white space, re.split(" ", string) and re.split("\s+", string) would give out same values, as shown next:

>>> import re
>>> a = re.split(" ", "Why is this wrong")
>>> a
["Why", "is", "this", "wrong"]
>>> import re
>>> a = re.split("\s+", "Why is this wrong")
>>> a
["Why", "is", "this", "wrong"]

These two give out the same answers so I thought that they were the same thing. However, it turns out that these are different. In what case would it be different? And what am I missing here that is blinding me?

like image 833
Sihwan Lee Avatar asked Dec 03 '22 17:12

Sihwan Lee


1 Answers

This only look similar based on your example.

A split on ' ' (a single space) does exactly that - it splits on a single space. Consecutive spaces will lead to empty "matches" when you split.

A split on '\s+' will also split on multiple occurences of those characters and it includes other whitespaces then "pure spaces":

import re

a = re.split(" ", "Why    is this  \t \t  wrong")
b = re.split("\s+", "Why    is this  \t \t  wrong")

print(a)
print(b)

Output:

# re.split(" ",data)
['Why', '', '', '', 'is', 'this', '', '\t', '\t', '', 'wrong']

# re.split("\s+",data)
['Why', 'is', 'this', 'wrong']

Documentation:

\s
Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v]. (https://docs.python.org/3/howto/regex.html#matching-characters)

like image 193
Patrick Artner Avatar answered Dec 06 '22 09:12

Patrick Artner