Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using CRC32 algorithm to hash string at compile-time

Basically I want in my code to be able to do this:

 Engine.getById(WSID('some-id'));

Which should get transformed by

 Engine.getById('1a61bc96');

just before being compiled into asm. So at compile-time.

This is my try

constexpr int WSID(const char* str) {
    boost::crc_32_type result;
    result.process_bytes(str,sizeof(str));
    return result.checksum();
}

But I get this when trying to compile with MSVC 18 (CTP November 2013)

error C3249: illegal statement or sub-expression for 'constexpr' function

How can I get the WSID function, using this way or any, as long as it is done during compile time?

Tried this: Compile time string hashing

 warning C4592: 'crc32': 'constexpr' call evaluation failed; function will be called at run-time

EDIT:

I first heard about this technique in Game Engine Architecture by Jason Gregory. I contacted the author who obligingly answer to me this :

What we do is to pass our source code through a custom little pre-processor that searches for text of the form SID('xxxxxx') and converts whatever is between the single quotes into its hashed equivalent as a hex literal (0xNNNNNNNN). [...]

You could conceivably do it via a macro and/or some template metaprogramming, too, although as you say it's tricky to get the compiler to do this kind of work for you. It's not impossible, but writing a custom tool is easier and much more flexible. [...]

Note also that we chose single quotes for SID('xxxx') literals. This was done so that we'd get some reasonable syntax highlighting in our code editors, yet if something went wrong and some un-preprocessed code ever made it thru to the compiler, it would throw a syntax error because single quotes are normally reserved for single-character literals.

Note also that it's crucial to have your little pre-processing tool cache the strings in a database of some sort, so that the original strings can be looked up given the hash code. When you are debugging your code and you inspect a StringId variable, the debugger will normally show you the rather unintelligible hash code. But with a SID database, you can write a plug-in that converts these hash codes back to their string equivalents. That way, you'll see SID('foo') in your watch window, not 0x75AE3080 [...]. Also, the game should be able to load this same database, so that it can print strings instead of hex hash codes on the screen for debugging purposes [...].

But while preprocess has some main advantages, it means that I have to prepare some kind of output system of modified files (those will be stored elsewhere, and then we need to tell MSVC). So it might complicate the compiling task. Is there a way to preprocess file with python for instance without headaches? But this is not the question, and I'm still interested about using compile-time function (about cache I could use an ID index)

like image 703
Vinz243 Avatar asked Feb 23 '15 14:02

Vinz243


People also ask

Is CRC32 a good hash?

CRC32 works very well as a hash algorithm. The whole point of a CRC is to hash a stream of bytes with as few collisions as possible.

What is CRC32 algorithm?

CRC32 is an error-detecting function that uses a CRC32 algorithm to detect changes between source and target data. The CRC32 function converts a variable-length string into an 8-character string that is a text representation of the hexadecimal value of a 32 bit-binary sequence.

Is CRC32 fast?

It's reasonably fast (375 MByte/s on my computer) and comes with only a small memory overhead. Often the look-up table isn't pre-computed at runtime but rather stored as a large table in the C code.

Is CRC faster than MD5?

CRC32 IS much faster than MD5, when a cryptographic library is properly implement.


2 Answers

Here is a solution that works entirely at compile time, but may also be used at runtime. It is a mix of constexpr, templates and macros. You may want to change some of the names or put them in a separate file since they are quite short.

Note that I reused code from this answer for the CRC table generation and I based myself off of code from this page for the implementation.

I have not tested it on MSVC since I don't currently have it installed in my Windows VM, but I believe it should work, or at least be made to work with trivial changes.

Here is the code, you may use the crc32 function directly, or the WSID function that more closely matches your question :

#include <cstring>
#include <cstdint>
#include <iostream>

// Generate CRC lookup table
template <unsigned c, int k = 8>
struct f : f<((c & 1) ? 0xedb88320 : 0) ^ (c >> 1), k - 1> {};
template <unsigned c> struct f<c, 0>{enum {value = c};};

#define A(x) B(x) B(x + 128)
#define B(x) C(x) C(x +  64)
#define C(x) D(x) D(x +  32)
#define D(x) E(x) E(x +  16)
#define E(x) F(x) F(x +   8)
#define F(x) G(x) G(x +   4)
#define G(x) H(x) H(x +   2)
#define H(x) I(x) I(x +   1)
#define I(x) f<x>::value ,

constexpr unsigned crc_table[] = { A(0) };

// Constexpr implementation and helpers
constexpr uint32_t crc32_impl(const uint8_t* p, size_t len, uint32_t crc) {
    return len ?
            crc32_impl(p+1,len-1,(crc>>8)^crc_table[(crc&0xFF)^*p])
            : crc;
}

constexpr uint32_t crc32(const uint8_t* data, size_t length) {
    return ~crc32_impl(data, length, ~0);
}

constexpr size_t strlen_c(const char* str) {
    return *str ? 1+strlen_c(str+1) : 0;
}

constexpr int WSID(const char* str) {
    return crc32((uint8_t*)str, strlen_c(str));
}

// Example usage
using namespace std;

int main() {
    cout << "The CRC32 is: " << hex << WSID("some-id") << endl;
}

The first part takes care of generating the table of constants, while crc32_impl is a standard CRC32 implementation converted to a recursive style that works with a C++11 constexpr. Then crc32 and WSID are just simple wrappers for convenience.

like image 119
tux3 Avatar answered Oct 03 '22 22:10

tux3


If anyone is interested, I coded up a CRC-32 table generator function and code generator function using C++14 style constexpr functions. The result is, in my opinion, much more maintainable code than many other attempts I have seen on the internet and it stays far, far away from the preprocessor.

Now, it does use a custom std::array 'clone' called cexp::array, because G++ seems to not have not added the constexpr keyword to their non-const reference index access/write operator.

However, it is quite light-weight, and hopefully the keyword will be added to std::array in the close future. But for now, the very simple array implementation is as follows:

namespace cexp
{

    // Small implementation of std::array, needed until constexpr
    // is added to the function 'reference operator[](size_type)'
    template <typename T, std::size_t N>
    struct array {
        T m_data[N];

        using value_type = T;
        using reference = value_type &;
        using const_reference = const value_type &;
        using size_type = std::size_t;

        // This is NOT constexpr in std::array until C++17
        constexpr reference operator[](size_type i) noexcept {
            return m_data[i];
        }

        constexpr const_reference operator[](size_type i) const noexcept {
            return m_data[i];
        }

        constexpr size_type size() const noexcept {
            return N;
        }
    };

}

Now, we need to generate the CRC-32 table. I based the algorithm off some Hacker's Delight code, and it can probably be extended to support the many other CRC algorithms out there. But alas, I only required the standard implementation, so here it is:

// Generates CRC-32 table, algorithm based from this link:
// http://www.hackersdelight.org/hdcodetxt/crc.c.txt
constexpr auto gen_crc32_table() {
    constexpr auto num_bytes = 256;
    constexpr auto num_iterations = 8;
    constexpr auto polynomial = 0xEDB88320;

    auto crc32_table = cexp::array<uint32_t, num_bytes>{};

    for (auto byte = 0u; byte < num_bytes; ++byte) {
        auto crc = byte;

        for (auto i = 0; i < num_iterations; ++i) {
            auto mask = -(crc & 1);
            crc = (crc >> 1) ^ (polynomial & mask);
        }

        crc32_table[byte] = crc;
    }

    return crc32_table;
}

Next, we store the table in a global and perform rudimentary static checking on it. This checking could most likely be improved, and it is not necessary to store it in a global.

// Stores CRC-32 table and softly validates it.
static constexpr auto crc32_table = gen_crc32_table();
static_assert(
    crc32_table.size() == 256 &&
    crc32_table[1] == 0x77073096 &&
    crc32_table[255] == 0x2D02EF8D,
    "gen_crc32_table generated unexpected result."
);

Now that the table is generated, it's time to generate the CRC-32 codes. I again based the algorithm off the Hacker's Delight link, and at the moment it only supports input from a c-string.

// Generates CRC-32 code from null-terminated, c-string,
// algorithm based from this link:
// http://www.hackersdelight.org/hdcodetxt/crc.c.txt 
constexpr auto crc32(const char *in) {
    auto crc = 0xFFFFFFFFu;

    for (auto i = 0u; auto c = in[i]; ++i) {
        crc = crc32_table[(crc ^ c) & 0xFF] ^ (crc >> 8);
    }

    return ~crc;
}

For sake of completion, I generate one CRC-32 code below and statically check if it has the expected output, and then print it to the output stream.

int main() {
    constexpr auto crc_code = crc32("some-id");
    static_assert(crc_code == 0x1A61BC96, "crc32 generated unexpected result.");

    std::cout << std::hex << crc_code << std::endl;
}

Hopefully this helps anyone else that was looking to achieve compile time generation of CRC-32, or even in general.

like image 38
Deus Sum Avatar answered Oct 03 '22 23:10

Deus Sum