Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I calculate a hash/checksum/fingerprint of an object in c++?

How can I calculate a hash/checksum/fingerprint of an object in c++?

Requirements:

The function must be 'injective'(*). In other words, there should be no two different input objects, that return the same hash/checksum/fingerprint.

Background:

I am trying to come up with a simple pattern for checking whether or not an entity object has been changed since it was constructed. (In order to know which objects need to be updated in the database).

Note that I specifically do not want to mark the object as changed in my setters or anywhere else.

I am considering the following pattern: In short, every entity object that should be persisted, has a member function "bool is_changed()". Changed, in this context, means changed since the objects' constructor was called.

Note: My motivation for all this is to avoid the boilerplate code that comes with marking objects as clean/dirty or doing a member by member comparison. In other words, reduce risk of human error.

(Warning: psudo c++ code ahead. I have not tried compiling it).

class Foo {
private:

    std::string my_string;

    // Assume the "fingerprint" is of type long.
    long original_fingerprint;

    long current_fingerprint()
    {
        // *** Suggestions on which algorithm to use here? ***
    }
public:
    Foo(const std::string& my_string) :
        my_string(my_string)
    {
        original_fingerprint = current_fingerprint();
    }

    bool is_changed() const 
    {
        // If new calculation of fingerprint is different from the one
        // calculated in the constructor, then the object has  
        // been changed in some way.

        return current_fingerprint() != original_fingerprint;
    }

    void set_my_string(const std::string& new_string)
    {
        my_string = new_string;
    }
}


void client_code()
{
    auto foo = Foo("Initial string");

    // should now return **false** because 
    // the object has not yet been changed:
    foo.is_changed();

    foo.set_my_string("Changed string");

    // should now return **true** because
    // the object has been changed:
    foo.is_changed();
}

(*) In practice, not necessarily in theory (like uuids are not unique in theory).

like image 353
unique_ptr Avatar asked Oct 18 '22 08:10

unique_ptr


2 Answers

You can use the CRC32 algorithm from Boost. Feed it with the memory locations of the data you want to checksum. You could use a hash for this, but hashes are cryptographic functions intended to guard against intentional data corruption and are slower. A CRC performs better.

For this example, I've added another data member to Foo:

int my_integer;

And this is how you would checksum both my_string and my_integer:

#include <boost/crc.hpp>
// ...

long current_fingerprint()
{
    boost::crc_32_type crc32;
    crc32.process_bytes(my_string.data(), my_string.length());
    crc32.process_bytes(&my_integer, sizeof(my_integer));
    return crc32.checksum();
}

However, now we're left with the issue of two objects having the same fingerprint if my_string and my_integer are equal. To fix this, we should include the address of the object in the CRC, since C++ guarantees that different objects will have different addresses.

One would think we can use:

process_bytes(&this, sizeof(this));

to do it, but we can't since this is an rvalue and thus we can't take its address. So we need to store the address in a variable instead:

long current_fingerprint()
{
    boost::crc_32_type crc32;
    void* this_ptr = this;
    crc32.process_bytes(&this_ptr, sizeof(this_ptr));
    crc32.process_bytes(my_string.data(), my_string.length());
    crc32.process_bytes(&my_integer, sizeof(my_integer));
    return crc32.checksum();
}
like image 177
Nikos C. Avatar answered Oct 21 '22 02:10

Nikos C.


Such a function does not exist, at least not in the context that you are requesting.

The STL provides hash functions for basic types (std::hash), and you could use these to implement a hash function for your objects using any reasonable hashing algorithm.

However, you seem to be looking for an injective function, which causes a problem. Essentially, to have an injective function, it would be necessary to have an output of size greater or equal to that of the object you are considering, since otherwise (from the pigeon hole principle) there would be two inputs that give the same output. Given that, the most sensible option would be to just do a straight-up comparison of the object to some sort of reference object.

like image 41
davidv1992 Avatar answered Oct 21 '22 00:10

davidv1992