Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identity quirk with string split()

>>> 'hi'.split()[0] is 'hi'
    True    
>>> 'hi there'.split()[0] is 'hi'
    False
>>> 'hi there again'.split()[0] is 'hi'
    False

My hypothesis:

The first line has only one element in split, while the other two have more than one element. I believe that while Python primitives like str are stored in memory by value within a function, there will be separate allocations across functions to simplify memory management. I think split() is one of those functions, and it usually allocates new strings. But it also handles the edge case of input that does not need any splitting (such as 'hi'), where the original string reference is simply returned. Is my explanation correct?

like image 890
onepiece Avatar asked Nov 10 '22 02:11

onepiece


1 Answers

I believe that while Python primitives like str are stored in memory by value within a function, there will be separate allocations across functions to simplify memory management.

Python's object allocation doesn't work anything like that. There isn't a real concept of "primitives", and aside from a few things the bytecode compiler does to merge constants, it doesn't matter whether two objects are created in the same function or different functions.

There isn't really a better answer to this than to point to the source, so here it is:

Py_LOCAL_INLINE(PyObject *)
STRINGLIB(split_whitespace)(PyObject* str_obj,
                           const STRINGLIB_CHAR* str, Py_ssize_t str_len,
                           Py_ssize_t maxcount)
{
    ...
#ifndef STRINGLIB_MUTABLE
        if (j == 0 && i == str_len && STRINGLIB_CHECK_EXACT(str_obj)) {
            /* No whitespace in str_obj, so just use it as list[0] */
            Py_INCREF(str_obj);
            PyList_SET_ITEM(list, 0, (PyObject *)str_obj);
            count++;
            break;
        }

If it doesn't find any whitespace to split on, it just reuses the original string object in the returned list. It's just a quirk of how this function was written, and you can't depend on it working that way in other Python versions or nonstandard Python implementations.

like image 163
user2357112 supports Monica Avatar answered Nov 14 '22 21:11

user2357112 supports Monica