Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strip() and strip(string.whitespace) give different results despite documentation suggesting they should be the same

I have a Unicode string with some non-breaking spaces at the beginning and end. I get different results when using strip() vs. strip(string.whitespace).

>>> import string
>>> s5 = u'\xa0\xa0hello\xa0\xa0'
>>> print s5.strip()
hello
>>> print s5.strip(string.whitespace)
  hello  

The documentation for strip() says, "If omitted or None, the chars argument defaults to removing whitespace." The documentation for string.whitespace says, "A string containing all characters that are considered whitespace."

So if string.whitespace contains all characters that are considered whitespace, then why are the results different? Does it have something to do with Unicode?

I am using Python 2.7.6

like image 534
Becca codes Avatar asked Mar 06 '14 16:03

Becca codes


People also ask

What is the purpose of the strip () method for strings?

The string strip() method in python is built-in from Python. It helps the developer to remove the whitespaces or specific characters from the string at the beginning and end of the string. Strip() method in string accepts only one parameter which is optional and has characters.

What is the difference between strip and Rstrip in Python?

strip(): returns a new string after removing any leading and trailing whitespaces including tabs ( \t ). rstrip(): returns a new string with trailing whitespace removed. It's easier to remember as removing white spaces from “right” side of the string.

What is read strip () in Python?

The strip() method removes any leading (spaces at the beginning) and trailing (spaces at the end) characters (space is the default leading character to remove)

How do I strip spaces from a string in Python?

strip() Python String strip() function will remove leading and trailing whitespaces. If you want to remove only leading or trailing spaces, use lstrip() or rstrip() function instead.


1 Answers

From the documentation of the string.whitespace:

A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.

It's the same under python3, where all non-ASCII constants where removed. (In python2 some constants could be influenced by locale settings).

Hence the difference in behaviour is quite obvious since strip() does remove any unicode whitespace, while strip(string.whitespace) removes only ASCII spaces. Your string clearly contains non-ASCII spaces.

like image 100
Bakuriu Avatar answered Oct 23 '22 17:10

Bakuriu