>>> 'hi'.split()[0] is 'hi'
True
>>> 'hi there'.split()[0] is 'hi'
False
>>> 'hi there again'.split()[0] is 'hi'
False
My hypothesis:
The first line has only one element in split, while the other two have more than one element. I believe that while Python primitives like str
are stored in memory by value within a function, there will be separate allocations across functions to simplify memory management. I think split()
is one of those functions, and it usually allocates new strings. But it also handles the edge case of input that does not need any splitting (such as 'hi'
), where the original string reference is simply returned. Is my explanation correct?
I believe that while Python primitives like str are stored in memory by value within a function, there will be separate allocations across functions to simplify memory management.
Python's object allocation doesn't work anything like that. There isn't a real concept of "primitives", and aside from a few things the bytecode compiler does to merge constants, it doesn't matter whether two objects are created in the same function or different functions.
There isn't really a better answer to this than to point to the source, so here it is:
Py_LOCAL_INLINE(PyObject *)
STRINGLIB(split_whitespace)(PyObject* str_obj,
const STRINGLIB_CHAR* str, Py_ssize_t str_len,
Py_ssize_t maxcount)
{
...
#ifndef STRINGLIB_MUTABLE
if (j == 0 && i == str_len && STRINGLIB_CHECK_EXACT(str_obj)) {
/* No whitespace in str_obj, so just use it as list[0] */
Py_INCREF(str_obj);
PyList_SET_ITEM(list, 0, (PyObject *)str_obj);
count++;
break;
}
If it doesn't find any whitespace to split on, it just reuses the original string object in the returned list. It's just a quirk of how this function was written, and you can't depend on it working that way in other Python versions or nonstandard Python implementations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With