Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing Python dicts with floating point values included

Tags:

python

I want to compare a pair of dictionaries and using 'fuzzy' floating point comparison or better yet use numpy.allclose() to do so. However, using the default == or != in Python for dicts doesn't do this.

I was wondering if there was a way to change the floating point comparison operation (probably using a context manager for safe cleanup).

I believe an example will help here. I have a deeply nested dict that contains all sorts of values. Some of these values are floating point values. I know there are tons of pitfalls for 'comparing' floating point values, etc.

d1 = {'a': {'b': 1.123456}}
d2 = {'a': {'b': 1.1234578}}

I would like to use != to compare these two dicts and have it return True if the only differences are floating point numbers within a certain range. For example, do not count the values different if the are close (not sure the precision I want yet).

I suppose I could recursively go through the dicts myself and manually just use numpy.allclose() for floating point values and fall back to the normal equality testing for all other types, etc. However, this is a bit tricky and error prone. I do think this would be an acceptable solution, and I'd love to see one like it. Hopefully there is something more elegant though.

The elegant solution in my head would look something like the following. However, I don't know if anything like this is even possible:

with hacked_float_compare:
    result = d1 != d2

Thus, inside this context manager I would be replacing the floating point comparison (just for standard float() values with either my own comparison or numpy.allclose().

Again, I'm not sure this is possible because monkey patching float() can't really be done since it's written in C. I'd also like to avoid having to change every floating point value in the dicts to my own float class that has a __eq__(). Maybe this is the best way though?

like image 216
durden2.0 Avatar asked Dec 06 '12 17:12

durden2.0


2 Answers

Avoid subclassing built-in types. You'll regret it when you find out your objects have changed type for some unknown reason. Use delegation instead. For example:

import operator as op


class FuzzyDict(object):
    def __init__(self, iterable=(), float_eq=op.eq):
        self._float_eq = float_eq
        self._dict = dict(iterable)

    def __getitem__(self, key):
        return self._dict[key]

    def __setitem__(self, key, val):
        self._dict[key] = val

    def __iter__(self):
        return iter(self._dict)

    def __len__(self):
        return len(self._dict)

    def __contains__(self, key):
        return key in self._dict

    def __eq__(self, other):
        def compare(a, b):
            if isinstance(a, float) and isinstance(b, float):
                return self._float_eq(a, b)
            else:
                return a == b
        try:
            if len(self) != len(other):
                return False
            for key in self:
                if not compare(self[key], other[key]):
                    return False
            return True
        except Exception:
            return False

    def __getattr__(self, attr):
        # free features borrowed from dict
        attr_val = getattr(self._dict, attr)
        if callable(attr_val):
            def wrapper(*args, **kwargs):
                result = attr_val(*args, **kwargs)
                if isinstance(result, dict):
                    return FuzzyDict(result, self._float_eq)
                return result
            return wrapper
        return attr_val

And an example usage:

>>> def float_eq(a, b):
...     return abs(a - b) < 0.01
... 
>>> A = FuzzyDict(float_eq=float_eq)
>>> B = FuzzyDict(float_eq=float_eq)
>>> A['a'] = 2.345
>>> A['b'] = 'a string'
>>> B['a'] = 2.345
>>> B['b'] = 'a string'
>>> B['a'] = 2.3445
>>> A == B
True
>>> B['a'] = 234.55
>>> A == B
False
>>> B['a'] = 2.345
>>> B['b'] = 'a strin'
>>> A == B
False

And they work even when nested:

>>> A['nested'] = FuzzyDict(float_eq=float_eq)
>>> A['nested']['a'] = 17.32
>>> B['nested'] = FuzzyDict(float_eq=float_eq)
>>> B['nested']['a'] = 17.321
>>> B['b'] = 'a string'   # changed before
>>> A == B
True
>>> B['nested']['a'] = 17.34
>>> A == B
False

A complete replacement for dict would require a bit more code and probably some testing to see how robust it is, but even the above solution provides many of the dict features(e.g. copy, setdefault, get, update etc.)


Regarding why you shouldn't subclass a built-in.

This solution seems easy and correct, but it generally isn't. First of all, even though you can subclass built-in types, this does not mean that they were written to be used as subclasses, so you may find out that to make something work you have to write more code than you thought.

Also, you'll probably want to use the built-in methods, but these methods will return an instance of the built-in type and not an instance of your class, which means that you have to reimplement every single method of the type. Also, you sometimes have to implement other methods that weren't implemented in the built-in.

For example, subclassing list you may think that, since list implements only __iadd__ and __add__ you'll be safe reimplementing these two methods, but you are wrong! You must also implement __radd__, otherwise expressions like:

[1,2,3] + MyList([1,2,3])

Would return a normal list and not MyList.

In summary, subclassing a built-in has a lot more consequences than what you may think at the beginning and it may introduce some unpredictable bugs due to change of types or behaviour that you did not expect. Debugging also becomes harder because you can't simply print the instances of the objects in the log, the representation would be correct! You really must check for the class of all the objects around to catch this subtle bugs.

In your specific situation, if you plan to convert the dictionaries only inside a single method, then you may avoid most disadvantages of subclassing dict, but at that point why don't you simply write a function and compare the dicts with it? This should work well, except if you want to pass the dicts to a library function that does the comparison.

like image 172
Bakuriu Avatar answered Nov 09 '22 21:11

Bakuriu


Just for reference, I think in my situation subclassing was not the best way. I've worked up a solution that I will most likely use here.

This is not the accepted answer since it was a collaborative approach based on what I learned from this thread. Just wanted a 'solution' that others could benefit from.

like image 41
durden2.0 Avatar answered Nov 09 '22 22:11

durden2.0