Since Tuples are immutable, why does slicing them make a copy instead of a view?

Tags:

python

As far as I understand it, tuples and strings are immutable to allow optimizations such as re-using memory that won't change. However, one obvious optimisation, making slices of tuples refer to the same memory as the original tuple, is not included in python.

I know that this optimization isn't included because when I time the following function, time taken goes like O(n^2) instead of O(n), so full copying is taking place:

def test(n):
    tup = tuple(range(n))
    for i in xrange(n):
        tup[0:i]

Is there some behavior of python that would change if this optimization was implemented? Is there some performance benefit to copying even when the original is immutable?

518

asked Jan 10 '16 20:01

QuadmasterXLII

1 Answers

By view, are you thinking of something equivalent to what numpy does? I'm familiar with how and why numpy does that.

A numpy array is an object with shape and dtype information, plus a data buffer. You can see this information in the __array_interface__ property. A view is a new numpy object, with its own shape attribute, but with a new data buffer pointer that points to someplace in the source buffer. It also has a flag that says "I don't own the buffer". numpy also maintains its own reference count, so the data buffer is not destroyed if the original (owner) array is deleted (and garbage collected).

This use of views can be big time saver, especially with very large arrays (questions about memory errors are common on SO). Views also allow different dtype, so a data buffer can be viewed at 4 byte integers, or 1 bytes characters, etc.

How would this apply to tuples? My guess is that it would require a lot of extra baggage. A tuple consists of a fixed set of object pointers - probably a C array. A view would use the same array, but with its own start and end markers (pointers and/or lengths). What about sharing flags? Garbage collection?

And what's the typical size and use of tuples? A common use of tuples is to pass arguments to a function. My guess is that a majority of tuples in a typical Python run are small - 0, 1 or 2 elements. Slices are allowed, but are they very common? On small tuples or very large ones?

Would there be any unintended consequences to making tuple slices views (in the numpy sense)? The distinction between views and copies is one of the harder things for numpy users to grasp. Since a tuple is supposed to be immutable - that is the pointers in the tuple cannot be changed - it is possible that implementing views would be invisible to users. But still I wonder.

It may make most sense to try this idea on a branch of the PyPy version - unless you really like to get dig into Cpython code. Or as a custom class with Cython.

answered Sep 25 '22 09:09

hpaulj

Related questions
                            
                                CSV to Feather in Pandas with slicing Rows
                            
                                No output after using PyCUDA
                            
                                ERROR: Directory is not installable. Neither 'setup.py' nor 'pyproject.toml'
                            
                                Finding smallest eigenvectors of large sparse matrix, over 100x slower in SciPy than in Octave
                            
                                how to register more than 10 apps in Google App Engine
                            
                                what is wrong with c++ streams when using boost.python?
                            
                                How to detect gestures in OpenKinect (with python wrappers)
                            
                                How do I combine a timezone aware date and time in Python?
                            
                                Install PIL in Ubuntu 12.04 Python 2.7 and Python 3.2
                            
                                How to debug Django app running on Heroku using a remote pdb connection?
                            
                                ipython ipdb, when invoked via ipdb.set_trace(), does not remember the command history while debugging
                            
                                sqlalchemy validator for two fields
                            
                                Cython VS C++ Performance Comparison? [closed]
                            
                                How to document fortran function for f2py?
                            
                                Regarding installing SciPy from PyCharm
                            
                                Validation on query_params in Django Rest Framework
                            
                                numpy array 1.9.2 getting ValueError: could not broadcast input array from shape (4,2) into shape (4)
                            
                                Manually calling spark's garbage collection from pyspark
                            
                                Celery restart loss scheduled tasks
                            
                                Detecting comic strip dialogue bubble regions in images

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With