Here is what you can find in the str.strip documentation:
The chars argument is a string specifying the set of characters to be removed. If omitted or
None, the chars argument defaults to removing whitespace.
Now my question is: which specific characters are considered whitespace?
These function calls share the same result:
>>> ' '.strip()
''
>>> '\n'.strip()
''
>>> '\r'.strip()
''
>>> '\v'.strip()
''
>>> '\x1e'.strip()
''
In this related question, a user mentioned that the str.strip function works with a superset of ASCII whitespace characters (in other words, a superset of string.whitespace). More specifically, it works with all unicode whitespace characters.
Moreover, I believe (but I'm just guessing, I have no proofs) that c.isspace() returns True for each character c that would also be removed by str.strip. Is that correct? If so, I guess one could just run c.isspace() for each unicode character c, and come up with a list of whitespace characters that are removed by default by str.strip.
>>> ' '.isspace()
True
>>> '\n'.isspace()
True
>>> '\r'.isspace()
True
>>> '\v'.isspace()
True
>>> '\x1e'.isspace()
True
Is my assumption correct? And if so, how can I find some proofs? Is there an easier way to know which specific characters are automatically removed by str.strip?
The most trivial way to know which characters are removed by str.strip() is to loop over each possible characters and check if a string containing such character gets altered by str.strip():
c = 0
while True:
try:
s = chr(c)
except ValueError:
break
if (s != s.strip()):
print(f"{hex(c)} is stripped", flush=True)
c+=1
As suggested in the comments, you may also print a table to check if str.strip(), str.split() and str.isspace() share the same behaviour about white spaces:
c = 0
print("char\tstrip\tsplit\tisspace")
while True:
try:
s = chr(c)
except ValueError:
break
stripped = s != s.strip()
splitted = not s.split()
spaced = s.isspace()
if (stripped or splitted or spaced):
print(f"{hex(c)}\t{stripped}\t{splitted}\t{spaced}", flush=True)
c+=1
If I run the code above I get:
char strip split isspace
0x9 True True True
0xa True True True
0xb True True True
0xc True True True
0xd True True True
0x1c True True True
0x1d True True True
0x1e True True True
0x1f True True True
0x20 True True True
0x85 True True True
0xa0 True True True
0x1680 True True True
0x2000 True True True
0x2001 True True True
0x2002 True True True
0x2003 True True True
0x2004 True True True
0x2005 True True True
0x2006 True True True
0x2007 True True True
0x2008 True True True
0x2009 True True True
0x200a True True True
0x2028 True True True
0x2029 True True True
0x202f True True True
0x205f True True True
0x3000 True True True
So, at least in python 3.10.4, your assumption seems to be correct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With