I want to compare a pair of dictionaries and using 'fuzzy' floating point comparison or better yet use numpy.allclose()
to do so. However, using the default ==
or !=
in Python for dicts doesn't do this.
I was wondering if there was a way to change the floating point comparison operation (probably using a context manager for safe cleanup).
I believe an example will help here. I have a deeply nested dict that contains all sorts of values. Some of these values are floating point values. I know there are tons of pitfalls for 'comparing' floating point values, etc.
d1 = {'a': {'b': 1.123456}}
d2 = {'a': {'b': 1.1234578}}
I would like to use !=
to compare these two dicts and have it return True
if the only differences are floating point numbers within a certain range. For example, do not count the values different if the are close (not sure the precision I want yet).
I suppose I could recursively go through the dicts myself and manually just use numpy.allclose()
for floating point values and fall back to the normal equality testing for all other types, etc. However, this is a bit tricky and error prone. I do think this would be an acceptable solution, and I'd love to see one like it. Hopefully there is something more elegant though.
The elegant solution in my head would look something like the following. However, I don't know if anything like this is even possible:
with hacked_float_compare:
result = d1 != d2
Thus, inside this context manager I would be replacing the floating point comparison (just for standard float()
values with either my own comparison or numpy.allclose()
.
Again, I'm not sure this is possible because monkey patching float()
can't really be done since it's written in C
. I'd also like to avoid having to change every floating point value in the dicts to my own float class that has a __eq__()
. Maybe this is the best way though?
Avoid subclassing built-in types. You'll regret it when you find out your objects have changed type for some unknown reason. Use delegation instead. For example:
import operator as op
class FuzzyDict(object):
def __init__(self, iterable=(), float_eq=op.eq):
self._float_eq = float_eq
self._dict = dict(iterable)
def __getitem__(self, key):
return self._dict[key]
def __setitem__(self, key, val):
self._dict[key] = val
def __iter__(self):
return iter(self._dict)
def __len__(self):
return len(self._dict)
def __contains__(self, key):
return key in self._dict
def __eq__(self, other):
def compare(a, b):
if isinstance(a, float) and isinstance(b, float):
return self._float_eq(a, b)
else:
return a == b
try:
if len(self) != len(other):
return False
for key in self:
if not compare(self[key], other[key]):
return False
return True
except Exception:
return False
def __getattr__(self, attr):
# free features borrowed from dict
attr_val = getattr(self._dict, attr)
if callable(attr_val):
def wrapper(*args, **kwargs):
result = attr_val(*args, **kwargs)
if isinstance(result, dict):
return FuzzyDict(result, self._float_eq)
return result
return wrapper
return attr_val
And an example usage:
>>> def float_eq(a, b):
... return abs(a - b) < 0.01
...
>>> A = FuzzyDict(float_eq=float_eq)
>>> B = FuzzyDict(float_eq=float_eq)
>>> A['a'] = 2.345
>>> A['b'] = 'a string'
>>> B['a'] = 2.345
>>> B['b'] = 'a string'
>>> B['a'] = 2.3445
>>> A == B
True
>>> B['a'] = 234.55
>>> A == B
False
>>> B['a'] = 2.345
>>> B['b'] = 'a strin'
>>> A == B
False
And they work even when nested:
>>> A['nested'] = FuzzyDict(float_eq=float_eq)
>>> A['nested']['a'] = 17.32
>>> B['nested'] = FuzzyDict(float_eq=float_eq)
>>> B['nested']['a'] = 17.321
>>> B['b'] = 'a string' # changed before
>>> A == B
True
>>> B['nested']['a'] = 17.34
>>> A == B
False
A complete replacement for dict
would require a bit more code and probably some testing to see how robust it is, but even the above solution provides many of the dict
features(e.g. copy
, setdefault
, get
, update
etc.)
Regarding why you shouldn't subclass a built-in.
This solution seems easy and correct, but it generally isn't. First of all, even though you can subclass built-in types, this does not mean that they were written to be used as subclasses, so you may find out that to make something work you have to write more code than you thought.
Also, you'll probably want to use the built-in methods, but these methods will return an instance of the built-in type and not an instance of your class, which means that you have to reimplement every single method of the type. Also, you sometimes have to implement other methods that weren't implemented in the built-in.
For example, subclassing list
you may think that, since list
implements only __iadd__
and __add__
you'll be safe reimplementing these two methods, but you are wrong! You must also implement __radd__
, otherwise expressions like:
[1,2,3] + MyList([1,2,3])
Would return a normal list
and not MyList
.
In summary, subclassing a built-in has a lot more consequences than what you may think at the beginning and it may introduce some unpredictable bugs due to change of types or behaviour that you did not expect. Debugging also becomes harder because you can't simply print the instances of the objects in the log, the representation would be correct! You really must check for the class of all the objects around to catch this subtle bugs.
In your specific situation, if you plan to convert the dictionaries only inside a single method, then you may avoid most disadvantages of subclassing dict
, but at that point why don't you simply write a function and compare the dict
s with it?
This should work well, except if you want to pass the dict
s to a library function that does the comparison.
Just for reference, I think in my situation subclassing was not the best way. I've worked up a solution that I will most likely use here.
This is not the accepted answer since it was a collaborative approach based on what I learned from this thread. Just wanted a 'solution' that others could benefit from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With