Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string identity comparison in CPython

Tags:

python

cpython

I have recently discovered a potential bug in a production system where two strings were compared using the identity operator, eg:

if val[2] is not 's':

I imagine this will however often work anyway, because as far as I know CPython stores the short immutable strings in the same location. I've replaced it with !=, but I need to confirm that the data that previously went through this code is correct, so I'd like to know if this always worked, or if it only sometimes worked.

The Python version has always been 2.6.6 as far as I know and the above code seems to be the only place where the is operator was used.

Does anyone know if this line will always work as the programmer intended?

edit: Because this is no doubt very specific and unhelpful to future readers, I'll ask a different question:

Where should I look to confirm with absolute certainty the behaviour of the Python implementation? Are the optimisations in CPython's source code easy to digest? Any tips?

like image 306
Will Hardy Avatar asked Dec 22 '22 18:12

Will Hardy


2 Answers

You can look at the CPython code for 2.6.x: http://svn.python.org/projects/python/branches/release26-maint/Objects/stringobject.c

It looks like one-character strings are treated specially, and each distinct one exists only once, so your code is safe. Here's some key code (excerpted):

static PyStringObject *characters[UCHAR_MAX + 1];

PyObject *
PyString_FromStringAndSize(const char *str, Py_ssize_t size)
{
    register PyStringObject *op;
    if (size == 1 && str != NULL &&
        (op = characters[*str & UCHAR_MAX]) != NULL)
    {
        Py_INCREF(op);
        return (PyObject *)op;
    }

...
like image 59
Ned Batchelder Avatar answered Dec 28 '22 10:12

Ned Batchelder


You are certainly not supposed to use the is/is not operator when you just want to compare two objects without checking if those objects are the same.

While it makes sense that python never creates a new string object with the same contents as an existing one (since strings are immutable) and equality and identity are equivalent due to this, I wouldn't rely on that, especially with the tons of python implementations out there.

like image 35
ThiefMaster Avatar answered Dec 28 '22 10:12

ThiefMaster