Why does a space affect the identity comparison of equal strings? [duplicate]

Tags:

python

I've noticed that adding a space to identical strings makes them compare unequal using is, while the non-space versions compare equal.

a = 'abc'
b = 'abc'
a is b
#outputs: True

a = 'abc abc'
b = 'abc abc'
a is b
#outputs: False

I have read this question about comparing strings with == and is. I think this is a different question because the space character is changing the behavior, not the length of the string. See:

a = 'abc'
b = 'abc'
a is b # True

a = 'gfhfghssrtjyhgjdagtaerjkdhhgffdhfdah'
b = 'gfhfghssrtjyhgjdagtaerjkdhhgffdhfdah'
a is b # True

Why does adding a space to the string change the result of this comparison?

915

asked Feb 04 '15 19:02

1 Answers

The python interpreter caches some strings based on certain criteria, the first abc string is cached and used for both but the second is not. It is the same for small ints from -5 to 256.

Because the strings are interned/cached assigning a and b to "abc" makes a and b point to the same objects in memory so using is, which checks if two objects are actually the same object, returns True.

The second string abc abc is not cached so they are two entirely different object in memory so out identity check using is returns False. This time a is not b. They are both pointing to different objects in memory.

In [43]: a = "abc" # python caches abc
In [44]: b = "abc" # it reuses the object when assigning to b
In [45]: id(a)
Out[45]: 139806825858808    # same id's, same object in memory
In [46]: id(b)
Out[46]: 139806825858808    
In [47]: a = 'abc abc'   # not cached  
In [48]: id(a)
Out[48]: 139806688800984    
In [49]: b = 'abc abc'    
In [50]: id(b)         # different id's different objects
Out[50]: 139806688801208

The criteria for caching strings is if the string only has letters, underscores and numbers in the string so in your case the space does not meet the criteria.

Using the interpreter there is one case where you can end up pointing to the same object even when the string does not meet the above criteria, multiple assignments.

In [51]: a,b  = 'abc abc','abc abc'

In [52]: id(a)
Out[52]: 139806688801768

In [53]: id(b)
Out[53]: 139806688801768

In [54]: a is b
Out[54]: True

Looking codeobject.c source for deciding the criteria we see NAME_CHARS decides what can be interned:

#define NAME_CHARS \
    "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"

/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */

static int
all_name_chars(unsigned char *s)
{
    static char ok_name_char[256];
    static unsigned char *name_chars = (unsigned char *)NAME_CHARS;

    if (ok_name_char[*name_chars] == 0) {
        unsigned char *p;
        for (p = name_chars; *p; p++)
            ok_name_char[*p] = 1;
    }
    while (*s) {
        if (ok_name_char[*s++] == 0)
            return 0;
    }
    return 1;
}

A string of length 0 or 1 will always be shared as we can see in the PyString_FromStringAndSize function in the stringobject.c source.

/* share short strings */
    if (size == 0) {
        PyObject *t = (PyObject *)op;
        PyString_InternInPlace(&t);
        op = (PyStringObject *)t;
        nullstring = op;
        Py_INCREF(op);
    } else if (size == 1 && str != NULL) {
        PyObject *t = (PyObject *)op;
        PyString_InternInPlace(&t);
        op = (PyStringObject *)t;
        characters[*str & UCHAR_MAX] = op;
        Py_INCREF(op);
    }
    return (PyObject *) op;
}

Not directly related to the question but for those interested PyCode_New also from the codeobject.c source shows how more strings are interned when building a codeobject once the strings meet the criteria in all_name_chars.

PyCodeObject *
PyCode_New(int argcount, int nlocals, int stacksize, int flags,
       PyObject *code, PyObject *consts, PyObject *names,
       PyObject *varnames, PyObject *freevars, PyObject *cellvars,
       PyObject *filename, PyObject *name, int firstlineno,
       PyObject *lnotab)
{
    PyCodeObject *co;
    Py_ssize_t i;
    /* Check argument types */
    if (argcount < 0 || nlocals < 0 ||
        code == NULL ||
        consts == NULL || !PyTuple_Check(consts) ||
        names == NULL || !PyTuple_Check(names) ||
        varnames == NULL || !PyTuple_Check(varnames) ||
        freevars == NULL || !PyTuple_Check(freevars) ||
        cellvars == NULL || !PyTuple_Check(cellvars) ||
        name == NULL || !PyString_Check(name) ||
        filename == NULL || !PyString_Check(filename) ||
        lnotab == NULL || !PyString_Check(lnotab) ||
        !PyObject_CheckReadBuffer(code)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    intern_strings(names);
    intern_strings(varnames);
    intern_strings(freevars);
    intern_strings(cellvars);
    /* Intern selected string constants */
    for (i = PyTuple_Size(consts); --i >= 0; ) {
        PyObject *v = PyTuple_GetItem(consts, i);
        if (!PyString_Check(v))
            continue;
        if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
            continue;
        PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
    }

This answer is based on simple assignments using the cpython interpreter, as far as interning in relation to functions or any other functionality outside of simple assignments, that was not asked nor answered.

If anyone with a greater understanding of c code has anything to add feel free to edit.

There is a much more thorough explanation here of the whole string interning.

answered Oct 31 '22 08:10

Padraic Cunningham

Related questions
                            
                                Reorder dictionary in python according to a list of values
                            
                                Where are the files downloaded using pip stored in virtualenv?
                            
                                Printing Objects in Django
                            
                                numpy array creating with a sequence
                            
                                Authenticate by IP address in Django
                            
                                Find area of polygon from xyz coordinates
                            
                                Draw an ellipse using Shapely
                            
                                Python SQLite how to get SQL string statement being executed
                            
                                How to find spans with a specific class containing specific text using beautiful soup and re?
                            
                                Rotate pandas DataFrame 90 degrees
                            
                                numpy loadtxt skip first row
                            
                                Python Peewee execute_sql() example
                            
                                Generating random correlated x and y points using Numpy
                            
                                python command line arguments in main, skip script name
                            
                                setuptools and pip: choice of minimal and complete install
                            
                                SQLAlchemy engine absolute path URL in windows
                            
                                Dynamically defining instance fields in Python classes
                            
                                Running an async background task in Tornado
                            
                                How to tell Spyder's style analysis PEP8 to read from a setup.cfg or increase max. line length?
                            
                                Additional Serializer Fields in Django REST Framework 3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does a space affect the identity comparison of equal strings? [duplicate]

Tags:

python

midkin

People also ask

1 Answers

Padraic Cunningham

Recent Activity

Donate For Us