Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two objects using serialization C#

Why it is not a good practice to compare two objects by serializing them and then compare the strings like in the following example?

public class Obj
{
    public int Prop1 { get; set; }
    public string Prop2 { get; set; }
}

public class Comparator<T> : IEqualityComparer<T>
{
    public bool Equals(T x, T y)
    {
        return JsonConvert.SerializeObject(x) == JsonConvert.SerializeObject(y);
    }

    public int GetHashCode(T obj)
    {
        return JsonConvert.SerializeObject(obj).GetHashCode();
    }
}

Obj o1 = new Obj { Prop1 = 1, Prop2 = "1" };
Obj o2 = new Obj { Prop1 = 1, Prop2 = "2" };

bool result = new Comparator<Obj>().Equals(o1, o2);

I have tested it and it works, it is generic so it could stand for a great diversity of objects, but what I am asking is which are the downsides of this approach for comparing objects?

I have seen it has been suggested in this question and it received some upvotes but I can't figure it out why this is not considered the best way, if somebody wants to compare just the values of the properties of two objects?

EDIT : I am strictly talking about Json serialize, not XML.

I am asking this because I want to create a simple and generic Comparator for a Unit Test project, so the performance of comparison does not bother me so much, as I know this may be one of the biggest down-sides. Also the typeless problem can be handled using in case of Newtonsoft.Json the TypeNameHandling property set to All.

like image 792
meJustAndrew Avatar asked Jul 16 '16 12:07

meJustAndrew


2 Answers

You probably going to keep adding a bounty to the question until somebody tells you that it is just fine to do this. So you got it, don't hesitate to take advantage of the NewtonSoft.Json library to keep the code simple. You just need some good arguments to defend your decision if your code is ever reviewed or if somebody else takes over the maintenance of the code.

Some of the objections they may raise, and their counter-arguments:

This is very inefficient code!

It certainly is, particularly GetHashCode() can make your code brutally slow if you ever use the object in a Dictionary or HashSet.

Best counter-argument is to note that efficiency is of little concern in a unit test. The most typical unit test takes longer to get started than to actually execute and whether it takes 1 millisecond or 1 second is not relevant. And a problem you are likely to discover very early.

You are unit-testing a library you did not write!

That is certainly a valid concern, you are in effect testing NewtonSoft.Json's ability to generate a consistent string representation of an object. There is cause to be alarmed about this, in particular floating point values (float and double) are never not a problem. There is also some evidence that the library author is unsure how to do it correctly.

Best counter-argument is that the library is widely used and well maintained, the author has released many updates over the years. Floating point consistency concerns can be reasoned away when you make sure that the exact same program with the exact same runtime environment generates both strings (i.e. don't store it) and you make sure the unit-test is built with optimization disabled.

You are not unit-testing the code that needs to be tested!

Yes, you would only write this code if the class itself provides no way to compare objects. In other words, does not itself override Equals/GetHashCode and does not expose a comparator. So testing for equality in your unit test exercise a feature that the to-be-tested code does not actually support. Something that a unit test should never do, you can't write a bug report when the test fails.

Counter argument is to reason that you need to test for equality to test another feature of the class, like the constructor or property setters. A simple comment in the code is enough to document this.

like image 89
3 revs, 2 users 93% Avatar answered Sep 28 '22 06:09

3 revs, 2 users 93%


The primary problem is that it is inefficient

As an example imagine this Equals function

public bool Equals(T x, T y)
{
    return x.Prop1 == y.Prop1
        && x.Prop2 == y.Prop2
        && x.Prop3 == y.Prop3
        && x.Prop4 == y.Prop4
        && x.Prop5 == y.Prop5
        && x.Prop6 == y.Prop6;
}

if prop1 are not the same then the other 5 compares never need to be checked, if you did this with JSON you would have to convert the entire object into a JSON string then compare the string every time, this is on top of serialization being an expensive task all on its own.

Then the next problem is serialization is designed for communication e.g. from memory to a file, across a network, etc. If you have leveraged serialization for comparison you can degrade your ability to use it for it normal use, i.e. you can't ignore fields not required for transmission because ignoring them might break your comparer.

Next JSON in specific is Type-less which means than values than are not in anyway shape or form equal may be mistaken for being equal, and in the flipside values that are equal may not compare as equal due to formatting if they serialize to the same value, this is again unsafe and unstable

The only upside to this technique is that is requires little effort for the programmer to implement

like image 29
MikeT Avatar answered Sep 28 '22 07:09

MikeT