As far as I understand it, tuples and strings are immutable to allow optimizations such as re-using memory that won't change. However, one obvious optimisation, making slices of tuples refer to the same memory as the original tuple, is not included in python.
I know that this optimization isn't included because when I time the following function, time taken goes like O(n^2) instead of O(n), so full copying is taking place:
def test(n):
tup = tuple(range(n))
for i in xrange(n):
tup[0:i]
Is there some behavior of python that would change if this optimization was implemented? Is there some performance benefit to copying even when the original is immutable?
Slicing lists does not generate copies of the objects in the list; it just copies the references to them. That is the answer to the question as asked.
We can use slicing in tuples I'm the same way as we use in strings and lists. Tuple slicing is basically used to obtain a range of items. Furthermore, we perform tuple slicing using the slicing operator. We can represent the slicing operator in the syntax [start:stop:step].
Slicing. We can access a range of items in a tuple by using the slicing operator colon : . Slicing can be best visualized by considering the index to be between the elements as shown below. So if we want to access a range, we need the index that will slice the portion from the tuple.
In short, slicing is a flexible tool to build new lists out of an existing list. Python supports slice notation for any sequential data type like lists, strings, tuples, bytes, bytearrays, and ranges. Also, any new data structure can add its support as well.
By view
, are you thinking of something equivalent to what numpy
does? I'm familiar with how and why numpy
does that.
A numpy
array
is an object with shape and dtype information, plus a data buffer. You can see this information in the __array_interface__
property. A view
is a new numpy object, with its own shape attribute, but with a new data buffer pointer that points to someplace in the source buffer. It also has a flag that says "I don't own the buffer". numpy
also maintains its own reference count, so the data buffer is not destroyed if the original (owner) array is deleted (and garbage collected).
This use of views can be big time saver, especially with very large arrays (questions about memory errors are common on SO). Views also allow different dtype
, so a data buffer can be viewed at 4 byte integers, or 1 bytes characters, etc.
How would this apply to tuples? My guess is that it would require a lot of extra baggage. A tuple consists of a fixed set of object pointers - probably a C array. A view would use the same array, but with its own start and end markers (pointers and/or lengths). What about sharing flags? Garbage collection?
And what's the typical size and use of tuples? A common use of tuples is to pass arguments to a function. My guess is that a majority of tuples in a typical Python run are small - 0, 1 or 2 elements. Slices are allowed, but are they very common? On small tuples or very large ones?
Would there be any unintended consequences to making tuple slices views (in the numpy sense)? The distinction between views and copies is one of the harder things for numpy
users to grasp. Since a tuple is supposed to be immutable - that is the pointers in the tuple cannot be changed - it is possible that implementing views would be invisible to users. But still I wonder.
It may make most sense to try this idea on a branch of the PyPy
version - unless you really like to get dig into Cpython
code. Or as a custom class with Cython
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With