Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'is' operator behaves differently when comparing strings with spaces

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:

>>> b = 'is it the space?' >>> a = 'is it the space?' >>> a is b False >>> c = 'isitthespace' >>> d = 'isitthespace' >>> c is d True >>> e = 'isitthespace?' >>> f = 'isitthespace?' >>> e is f False 

It seems like the space and the question mark make the is behave differently. What's going on?

EDIT: I know I should be using ==, I just wanted to know why is behaves like this.

like image 425
luisdaniel Avatar asked May 26 '13 06:05

luisdaniel


People also ask

Do comparison operators work on strings?

The comparison operators also work on strings. To see if two strings are equal you simply write a boolean expression using the equality operator.

How does * operator behave on string?

Explanation: The * operator can be used to repeat the string for a given number of times. Writing two string literals together also concatenates them like + operator. If we want to concatenate strings in different lines, we can use parentheses.

Is operator in python on string?

In python, String operators represent the different types of operations that can be employed on the program's string type of variables. Python allows several string operators that can be applied on the python string are as below: Assignment operator: “=.” Concatenate operator: “+.”

Is operator a string?

String Operators ¶ There are two string operators. The first is the concatenation operator ('. '), which returns the concatenation of its right and left arguments. The second is the concatenating assignment operator (' .


2 Answers

Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.

Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:

  • Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)

  • An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.

  • Single characters are unique.

Examples

Alphanumeric string literals always share memory:

>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' >>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' >>> x is y True 

Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:

(interpreter)

>>> x='`!@#$%^&*() \][=-. >:"?<a'; y='`!@#$%^&*() \][=-. >:"?<a'; >>> z='`!@#$%^&*() \][=-. >:"?<a'; >>> x is y True  >>> x is z False  

(file)

x='`!@#$%^&*() \][=-. >:"?<a'; y='`!@#$%^&*() \][=-. >:"?<a'; z=(lambda : '`!@#$%^&*() \][=-. >:"?<a')() print(x is y) print(x is z) 

Output: True and False

For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:

>>> 'a'*10+'a'*10 is 'a'*20 True >>> 'a'*21 is 'a'*21 False >>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa' False >>> t=2; 'a'*t is 'aa' False >>> 'a'.__add__('a') is 'aa' False >>> x='a' ; x+='a'; x is 'aa' False 

Single characters always share memory, of course:

>>> chr(0x20) is ' ' True 
like image 145
16 revs Avatar answered Sep 19 '22 18:09

16 revs


To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.

This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.

To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).


If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.

like image 41
poke Avatar answered Sep 20 '22 18:09

poke