Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two somewhat large objects for equality

Tags:

c++

I have to compare two larger objects for equality.

Properties of the objects:

  1. Contain all their members by value (so no pointers to follow).
  2. They also contain some stl::array.
  3. They contain some other objects for which 1 and 2 hold.
  4. Size is up to several kB.
  5. Some of the members will more likely differ than others, therefore lead to a quicker break of the comparison operation if compared first.
  6. The objects do not change. Basically, the algorithm is just to count how many objects are the same. Each object is only compared once against several "master" objects.

What is the best way to compare these objects? I see three options:

  1. Just use plain, non-overloaded operator==.
  2. Overload == and perform a member-by-member comparison, beginning with members likely to differ.
  3. Overload == and view the object as a plain byte field and compare word by word.

Some thoughts:
Option 1 seems good because it means the least amount of work (and opportunities to introduce errors).
Option 2 seems good, because I can exploit the heuristic about which elements differ most likely. But maybe it's still slower because built-in ==of option 1 is ridiculously fast.
Option 3 seems to be most "low-level" optimized, but that's what the compiler probably also does for option 1.

So the questions are:

  • Is there a well-known best way to solve the task?
  • Is one of the options an absolute no-go?
  • Do I have to consider something else?
like image 920
Michael Avatar asked Apr 28 '15 10:04

Michael


People also ask

When is it necessary to compare two values for equality?

It is sometimes necessary to compare two values for equality. In some cases, you are testing for value equality, also known as equivalence, which means that the values that are contained by the two variables are equal.

Is there a way to compare two objects with different properties?

Well, you could write some logic to compare all of the properties of the two objects to each other. This gets complicated when it is an object graph with complex subtypes, so you will need to determine how close is good enough.

Why are equality comparisons of floating-point values (double and float) problematic?

Equality comparisons of floating-point values ( double and float) are problematic because of the imprecision of floating-point arithmetic on binary computers. For more information, see the remarks in the topic System.Double.

How to compare two objects in Java?

Every class in java has one parent class that is Object class and it has equals () method to compare two any objects. This program compiles and executes successfully. When the e1.equals (e2) method is invoked it internally compares two object references but not the values. So it has to have a different address.


2 Answers

Default == is fast for small objects, but if you have big data members to compare, try to find some optimizations thinking about the specific data stored and the way they are update, redefining an overloaded == comparison operator more "smarter" than the default one.

As many already said, option 3 is wrong, due to the fact that fields generally are padded to respect the data-alignment, and for optimization reason the bytes added are not initialized to 0 (maybe this is done in the DEBUG version).

I can suggest you to explore the option to divide the check in two stages:

  • first stage, create some sort of small & fast member that "compress" the status of the instance (think like an hash); this field could be updated every time some big field changes, for example the elements of the stl-array. Than check frequently changed fields and this "compress" status first to make a conservative comparison (for example, the sum of all int in the array, or maybe the xor)
  • second stage, use an in-deep test for every members. This is the slowest but complete check, but likely will be activated only sometimes
like image 185
Mouze Avatar answered Oct 22 '22 12:10

Mouze


A good question.

If you have some heuristic about which members are likely to differ - use it. So that overloading operator == and checking suspected members first seems to be a good idea.

About byte-wise comparison (aka memcmp and friends) - may be problematic due to struct member alignment. I.e. the compiler sometimes puts "empty spaces" in your struct layout, so that each member will have required alignment. Those are not initialized and usually contain garbage.

This may be solved by explicit zero-initializing of your whole object. But, I don't see any advantage of memcmp vs automatic operator ==, which is a members-wise comparison. It probably may save some code size (a single call to memcpy vs explicit reads and comparisons), however from the performance perspective this seems to be pretty much the same.

like image 44
valdo Avatar answered Oct 22 '22 12:10

valdo