Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python is operator behaviour with string [duplicate]

I am unable to understand the following behaviour. I am creating 2 strings, and using is operator to compare it. On the first case, it is working differently. On the second case, it works as expected. What is the reason when I use comma or space, it is showing False on comparing with is and when no comma or space or other characters are used, it gives True

Python 3.6.5 (default, Mar 30 2018, 06:41:53) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 'string'
>>> b = a
>>> b is a
True
>>> b = 'string'
>>> b is a
True
>>> a = '1,2,3,4'
>>> b = a
>>> b is a
True
>>> b = '1,2,3,4'
>>> b is a
False

Is there a reliable information on why python interprets strings in different way? I understand that initially, a and b refers to same object. And then b gets a new object, still b is a says True. It is little confusing to understand the behaviour.

When I do it with 'string' - it produces same result. What's wrong when I use '1,2,3,4' - they both are strings. What's different from case 1 and case 2 ? i.e is operator producing different results for different contents of the strings.

like image 898
Sibidharan Avatar asked Apr 26 '18 07:04

Sibidharan


People also ask

Is operator in python on string?

In python, String operators represent the different types of operations that can be employed on the program's string type of variables. Python allows several string operators that can be applied on the python string are as below: Assignment operator: “=.” Concatenate operator: “+.”

Is operator a string?

String Operators ¶ There are two string operators. The first is the concatenation operator ('. '), which returns the concatenation of its right and left arguments. The second is the concatenating assignment operator (' .

How does * operator behave on string?

Explanation: The * operator can be used to repeat the string for a given number of times. Writing two string literals together also concatenates them like + operator. If we want to concatenate strings in different lines, we can use parentheses.


Video Answer


1 Answers

One important thing about this behavior is that Python caches some, mostly, short strings (usually less than 20 characters but not for every combination of them) so that they become quickly accessible. One important reason for that is that strings are widely used in Python's source code and it's an internal optimization to cache some special sorts of strings. Dictionaries are one of the generally used data structures in Python's source code that are used for preserving the variables, attributes, and namespaces in general, plus for some other purposes, and they all use strings as the object names. This is to say that every time you try to access an object attribute or have access to a variable (local or global) there's a dictionary lookup firing up internally.

Now, the reason that you got such bizarre behavior is that Python (CPython implementation) treats differently with strings in terms of interning. In Python's source code, there is a intern_string_constants function that gives strings the validation to be interned which you can check for more details. Or check this comprehensive article http://guilload.com/python-string-interning/.

It's also noteworthy that Python has an intern() function in the sys module that you can use to intern strings manually.

In [52]: b = sys.intern('a,,')

In [53]: c = sys.intern('a,,')

In [54]: b is c
Out[54]: True

You can use this function either when you want to fasten the dictionary lookups or when you're ought to use a particular string object frequently in your code.

Another point that you should not confuse with string interning is that when you do a == b, you're creating two references to the same object which is obvious for those keywords to have the same id.

Regarding punctuations, it seems that if they are one character they get interned if their length is more than one. If the length is more than one they won't get cached. As mentioned in the comments, one reason for that might be because it's less likely for keywords and dictionary keys to have punctuations in them.

In [28]: a = ','

In [29]: ',' is a
Out[29]: True

In [30]: a = 'abc,'

In [31]: 'abc,' is a
Out[31]: False

In [34]: a = ',,'

In [35]: ',,' is a
Out[35]: False

# Or

In [36]: a = '^'

In [37]: '^' is a
Out[37]: True

In [38]: a = '^%'

In [39]: '^%' is a
Out[39]: False

But still, these are just some speculations that you cannot rely on in your code.

like image 95
Mazdak Avatar answered Oct 19 '22 12:10

Mazdak