Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Hash for Arbitrary Objects?

Tags:

python

I've been using pickle.dumps in order to create a hash for an arbitrary Python object, however, I've found out that dict/set orders aren't canonicalized and the result is therefore unreliable.

There are several related questions on SO and elsewhere, but I can't seem to find a hashing algorithm that uses the same basis for equality (__getstate__/__dict__ results). I understand the basic requirements for rolling my own, but obviously I'd much prefer to use something that's been tested.

Does such a library exist? I suppose what I'm actually asking for is a library that serializes objects deterministically (using __getstate__ and __dict__) so that I can hash the output.

EDIT

To clarify, I'm looking for something different than the values returned by Python's hash (or __hash__). What I want is essentially a checksum for arbitrary objects which may or may not be hashable. This value should vary based on objects' state. (I'm using "state" to refer to the dict retuned by __getstate__ or, if that's not present, the object's __dict__.)

like image 793
matthewwithanm Avatar asked Apr 22 '13 22:04

matthewwithanm


1 Answers

It occurred to me that Pickler can be extended and the select functions overridden to canonicalize the necessary types, so that's what I'm doing. Here's what it looks like:

from copy import copy
from pickle import Pickler, MARK, DICT
from types import DictionaryType


class CanonicalizingPickler(Pickler):
    dispatch = copy(Pickler.dispatch)

    def save_set(self, obj):
        rv = obj.__reduce_ex__(0)
        rv = (rv[0], (sorted(rv[1][0]),), rv[2])
        self.save_reduce(obj=obj, *rv)

    dispatch[set] = save_set

    def save_dict(self, obj):
        write = self.write
        write(MARK + DICT)

        self.memoize(obj)
        self._batch_setitems(sorted(obj.iteritems()))

    dispatch[DictionaryType] = save_dict
like image 68
matthewwithanm Avatar answered Nov 12 '22 05:11

matthewwithanm