Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When can a struct safely be hashed as an array of bytes?

Tags:

c++

hash

For structs where equality means identical most-derived type and byte equality of each data member, when, if ever, can the struct safely be hashed as an array of bytes?

This document

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3333.html

under the heading "Hashing objects as byte arrays" suggests that structs with any padding can't be safely hashed as an array of bytes.

Is an explicit test for padding required to safely hash a struct as an array of bytes? Is it sufficient?

If so, does the following sketch appropriately illustrate that test?

#include <cstddef>
#include <iostream>

struct A
{
    int i;
    float f;
    char c;
};
// hashing would start at offs_i (possibly hopping over a v-table) and end at
// offs_c + sizeof(char)

int main()
{
    const std::size_t size_A = sizeof(A);
    const std::size_t size_int = sizeof(int);
    const std::size_t size_float = sizeof(float);
    const std::size_t size_char = sizeof(char);
    const std::size_t offs_i = offsetof(A, i);
    const std::size_t offs_f = offsetof(A, f);
    const std::size_t offs_c = offsetof(A, c);

    bool padded = false;
    if (offs_f != size_int)
        padded = true;
    else if (offs_c != size_int + size_float)
        padded = true;

    std::cout << "padded? " << std::boolalpha << padded << std::endl;
}

If a struct does have padding, is there any way to workaround to allow hashing as an array of bytes, e.g. zero-ing out the padding bits?

"Safe" here means two structs that compare equal will hash to identical values.

like image 857
Praxeolitic Avatar asked Mar 30 '15 14:03

Praxeolitic


2 Answers

Pretty much never. The Standard doesn't define how inheritance is implemented, e.g. with vtable pointers, so there's no guarantee that two classes which are of equal most-derived type would have any implementation-specific inheritance-related data be equal.

Furthermore, since they're not PODs, offsetof is not guaranteed to work or produce meaningful results- I believe it's actually just plain UB.

The long and short is, don't bother trying to treat structures as arrays of bytes because they're not.

As regards to his intentions on the paper, it's more likely a concession to some of the rabid "er mah gerd performances!" dogs on the committee rather than an actual thing that he wants to do.

like image 74
Puppy Avatar answered Sep 28 '22 08:09

Puppy


The paper you referenced prety much states the requirements:

is_trivially_copyable<T>::value &&
is_standard_layout<T>::value &&
is_contiguous_layout<T>::value

Which have to hold true recursively for the struct itself and all members.

The first two checks are already part of the standard library and you can implement is_contiguous_layout<T>::value by yourself. As a basis it should be enough to compare the sum of the sizes of its members with the size of the struct itself. I don't think, checking the offset is actually necessary.

This should certainly work on "the usual" platforms (CHAR_BIT == 8, 2-complement) if your type is only composed of integer types. I'm not sure however, if it also works with bools and floating point numbers, as I believe the standard doesn't mandate a unique two-way mapping between the value of a variable and its bit representation.

EDIT: I just realized that you cannot stisfy the condition of same most derived type, if in the derived class doesn't add any members or if you compare two distinct classes that happen to have the same members. So you would have to account for the type separately.

like image 22
MikeMB Avatar answered Sep 28 '22 09:09

MikeMB