Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which specific characters does the strip function remove?

Tags:

python

Here is what you can find in the str.strip documentation:

The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace.

Now my question is: which specific characters are considered whitespace?

These function calls share the same result:

>>> ' '.strip()
''
>>> '\n'.strip()
''
>>> '\r'.strip()
''
>>> '\v'.strip()
''
>>> '\x1e'.strip()
''

In this related question, a user mentioned that the str.strip function works with a superset of ASCII whitespace characters (in other words, a superset of string.whitespace). More specifically, it works with all unicode whitespace characters.

Moreover, I believe (but I'm just guessing, I have no proofs) that c.isspace() returns True for each character c that would also be removed by str.strip. Is that correct? If so, I guess one could just run c.isspace() for each unicode character c, and come up with a list of whitespace characters that are removed by default by str.strip.

>>> ' '.isspace()
True
>>> '\n'.isspace()
True
>>> '\r'.isspace()
True
>>> '\v'.isspace()
True
>>> '\x1e'.isspace()
True

Is my assumption correct? And if so, how can I find some proofs? Is there an easier way to know which specific characters are automatically removed by str.strip?

like image 664
Riccardo Bucco Avatar asked Apr 30 '26 09:04

Riccardo Bucco


1 Answers

The most trivial way to know which characters are removed by str.strip() is to loop over each possible characters and check if a string containing such character gets altered by str.strip():

c = 0
while True:
  try:
    s = chr(c)
  except ValueError:
    break
  if (s != s.strip()):
    print(f"{hex(c)} is stripped", flush=True)
  c+=1

As suggested in the comments, you may also print a table to check if str.strip(), str.split() and str.isspace() share the same behaviour about white spaces:

c = 0
print("char\tstrip\tsplit\tisspace")
while True:
  try:
    s = chr(c)
  except ValueError:
    break
  stripped = s != s.strip()
  splitted = not s.split()
  spaced = s.isspace()
  if (stripped or splitted or spaced):
    print(f"{hex(c)}\t{stripped}\t{splitted}\t{spaced}", flush=True)
  c+=1

If I run the code above I get:

char    strip   split   isspace
0x9     True    True    True
0xa     True    True    True
0xb     True    True    True
0xc     True    True    True
0xd     True    True    True
0x1c    True    True    True
0x1d    True    True    True
0x1e    True    True    True
0x1f    True    True    True
0x20    True    True    True
0x85    True    True    True
0xa0    True    True    True
0x1680  True    True    True
0x2000  True    True    True
0x2001  True    True    True
0x2002  True    True    True
0x2003  True    True    True
0x2004  True    True    True
0x2005  True    True    True
0x2006  True    True    True
0x2007  True    True    True
0x2008  True    True    True
0x2009  True    True    True
0x200a  True    True    True
0x2028  True    True    True
0x2029  True    True    True
0x202f  True    True    True
0x205f  True    True    True
0x3000  True    True    True

So, at least in python 3.10.4, your assumption seems to be correct.

like image 173
etuardu Avatar answered May 03 '26 00:05

etuardu