Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Versioned list instead of immutable list?

Tags:

python

list

Update:

  • As of CPython 3.6, dictionaries have a version (thank you pylang for showing this to me).
  • If they added the same version to list and made it public, all 3 asserts from my original post would pass! It would definitely meet my needs. Their implementation differs from what I envisioned, but I like it.
  • As it is, I don't feel I can use dictionary version:
    • It isn't public. Jake Vanderplas shows how to expose it in a post, but he cautions: definitely not code you should use for any purpose beyond simply having fun. I agree with his reasons.
    • In all of my use cases, the data is conceptually arrays of elements each of which has the same structure. A list of tuples is a natural fit. Using a dictionary would make the code less natural and probably more cumbersome.
  • Does anyone know if there are plans to add version to list?
  • Are there plans to make it public?

If there are plans to add version to list and make it public, I would feel awkward putting forward an incompatible VersionedList now. I would just implement the bare minimum I need and get by.


Original post below


Turns out that many of the times I wanted an immutable list, a VersionedList would have worked almost as well (sometimes even better).

  • Has anyone implemented a versioned list?
  • Is there a better, more Pythonic, concept that meets my needs? (See motivation below.)

What I mean by a versioned list is:

  • A class that behaves like a list
  • Any change to an instance or elements in the instance results in instance.version() being updated. So, if alist is a normal list:

    a = VersionedList(alist)
    a_version = a.version()
    change(a)
    assert a_version != a.version()
    reverse_last_change(a)  
    
  • If a list was hashable, hash() would achieve the above and meet all the needs identified in the motivation below. We need to define 'version()' in a way that doesn't have all of the same problems as 'hash()'.

    If identical data in two lists is highly unlikely to ever happen except at initialization, we aren't going to have a reason to test for deep equality. From (https://docs.python.org/3.5/reference/datamodel.html#object.hash) The only required property is that objects which compare equal have the same hash value. If we don't impose this requirement on 'version()', it seems likely that 'version()' won't have all of the same problems that makes lists unhashable. So unlike hash, identical contents doesn't mean the same version

    #contents of 'a' are now identical to original, but...
    assert a_version != a.version()
    
    b = VersionedList(alist)
    c = VersionedList(alist)
    assert b.version() != c.version()
    
  • For VersionList, it would be good if any attempt to modify the result of __get__ automatically resulted in a copy instead of modifying the underlying implementation data. I think that the only other option would be to have __get__ always return a copy of the elements, and this would be very inefficient for all of the use cases I can think of. I think we need to restrict the elements to immutable objects (deeply immutable, for example: exclude tuples with list elements). I can think of 3 ways to achieve this:

    1. Only allow elements that can't contain mutable elements (int, str, etc are fine, but exclude tuples). (This is far too limiting for my cases)
    2. Add code to __init__, __set__, etc to traverse inputs to deeply check for mutable sub-elements. (expensive, any way to avoid this?)
    3. Also allow more complex elements, but require that they are deeply immutable. Perhaps require that they expose a deeply_immutable attribute. (This turns out to be easy for all the use cases I have)

Motivation:

  1. If I am analyzing a dataset, I often have to perform multiple steps that return large datasets (note: since the dataset is ordered, it is best represented by a List not a set).

    If at the end of several steps (ex: 5) it turns out that I need to perform different analysis (ex: back at step 4), I want to know that the dataset from step 3 hasn't accidentally been changed. That way I can start at step 4 instead of repeating steps 1-3.

  2. I have functions (control-points, first-derivative, second-derivative, offset, outline, etc) that depend on and return array-valued objects (in the linear algebra sense). The base 'array' is knots.

    control-points() depends on: knots, algorithm_enum
    first-derivative() depends on: control-points(), knots
    offset() depends on: first-derivative(), control-points(), knots, offset_distance
    outline() depends on: offset(), end_type_enum

    If offset_distance changes, I want to avoid having to recalculate first-derivative() and control-points(). To avoid recalculation, I need to know that nothing has accidentally changed the resultant 'arrays'.

    If 'knots' changes, I need to recalculate everything and not depend on the previous resultant 'arrays'.

    To achieve this, knots and all of the 'array-valued' objects could be VersionedList.

FYI: I had hoped to take advantage of an efficient class like numpy.ndarray. In most of my use cases, the elements logically have structure. Having to mentally keep track of multi-dimensions of indexes meant implementing and debugging the algorithms was many times more difficult with ndarray. An implementation based on lists of namedtuples of namedtuples turned out to be much more sustainable.

like image 510
R. Clay Avatar asked Mar 16 '26 13:03

R. Clay


1 Answers

Private dicts in 3.6

In Python 3.6, dictionaries are now private (PEP 509) and compact (issue 27350), which track versions and preserve order respectively. These features are presently true when using the CPython 3.6 implementation. Despite the challenge, Jake VanderPlas demonstrates in his blog post a detailed demonstration of exposing this versioning feature from CPython within normal Python. We can use his approach to:

  1. determine when a dictionary has been updated
  2. preserve the order

Example

import numpy as np

d = {"a": np.array([1,2,3]),
     "c": np.array([1,2,3]),
     "b": np.array([8,9,10]),
    }

for i in range(3):
    print(d.get_version())                                 # monkey-patch
# 524938
# 524938
# 524938

Notice the version number does not change until the dictionary is updated, as shown below:

d.update({"c": np.array([10, 11, 12])})
d.get_version()
# 534448

In addition, the insertion order is preserved (the following was tested in restarted sessions of Python 3.5 and 3.6):

list(d.keys())
# ['a', 'c', 'b']

You may be able to take advantage of this new dictionary behavior, saving you from implementing a new datatype.


Details

For those interested, the latter get_version()is a monkey-patched method for any dictionary, implemented in Python 3.6 using the following modified code derived from Jake VanderPlas' blog post. This code was run prior to calling get_version().

import types
import ctypes
import sys
assert (3, 6) <= sys.version_info < (3, 7)                 # valid only in Python 3.6

py_ssize_t = ctypes.c_ssize_t  

# Emulate the PyObjectStruct from CPython
class PyObjectStruct(ctypes.Structure):
    _fields_ = [('ob_refcnt', py_ssize_t),
                ('ob_type', ctypes.c_void_p)]


# Create a DictStruct class to wrap existing dictionaries
class DictStruct(PyObjectStruct):
    _fields_ = [("ma_used", py_ssize_t),
                ("ma_version_tag", ctypes.c_uint64),
                ("ma_keys", ctypes.c_void_p),
                ("ma_values", ctypes.c_void_p),
               ]

    def __repr__(self):
        return (f"DictStruct(size={self.ma_used}, "
                f"refcount={self.ob_refcnt}, "
                f"version={self.ma_version_tag})")

    @classmethod
    def wrap(cls, obj):
        assert isinstance(obj, dict)
        return cls.from_address(id(obj))

assert object.__basicsize__ == ctypes.sizeof(PyObjectStruct)
assert dict.__basicsize__ == ctypes.sizeof(DictStruct)


# Code for monkey-patching existing dictionaries
class MappingProxyStruct(PyObjectStruct):
    _fields_ = [("mapping", ctypes.POINTER(DictStruct))]

    @classmethod
    def wrap(cls, D):
        assert isinstance(D, types.MappingProxyType)
        return cls.from_address(id(D))

assert types.MappingProxyType.__basicsize__ == ctypes.sizeof(MappingProxyStruct)


def mappingproxy_setitem(obj, key, val):
    """Set an item in a read-only mapping proxy"""
    proxy = MappingProxyStruct.wrap(obj)
    ctypes.pythonapi.PyDict_SetItem(proxy.mapping,
                                    ctypes.py_object(key),
                                    ctypes.py_object(val))

mappingproxy_setitem(dict.__dict__,
                     'get_version',
                     lambda self: DictStruct.wrap(self).ma_version_tag)
like image 92
pylang Avatar answered Mar 19 '26 04:03

pylang